Data Day Themes Banner Hero Image

Poster Abstracts and Presentations


Poster Number 1

Strategic Locations for Rental Housing Investment in Ottawa

Karen Fletcher and Samee Shahood

We partnered with the City of Ottawa to combine rental data, geographic information about Ottawa's amenities, and census information about Ottawa's population to identify strategic locations for rental housing investment.



DATA 5000

Poster Number 3

Analyzing Ottawa’s Entertainment Landscape to Combat the “Boring City” Stereotype

Justin Zhang, School of Computer Science, Hanna Khan, School of Journalism and Communication

To address the typical stereotype of Ottawa as a boring city, this study aims to identify gaps within Ottawa’s local entertainment landscape. We believe this negative perception is caused by an uneven distribution of entertainment venues, a lack of diverse entertainment options beyond “common visits” like tourist landmarks, and an insufficient amount of public transportation access. Therefore, our data-driven analysis of venue location, public transportation networks, and population densities reveal under served areas and inform policymakers of sectors that require further urban development. Our methodology consists of the collection of several datasets. Custom OC Transpo data was collected to understand the public connectivity of Ottawa entertainment hubs. The 2021 Statistics Canada Census data was used for insights into Ottawa’s population density, and distributions of existing entertainment hotspots collected via Google's Places API was used to find underserved areas. Our analysis utilizes geospatial techniques, such as clustering with H3 indexing and the Getis-Ord statistic to identify imbalances in transportation, entertainment, or population over the city of Ottawa. Furthermore, an open-access geospatial tool was developed to visualize these points of interest. Our findings indicate that certain areas, such as Central Orleans, Findlay Creek, and Carlingwood, contain adequate public transit and high population density but lack a diverse range of entertainment options. This research ultimately seeks to provide data-driven recommendations to strategically enhance entertainment options, improve urban planning, and positively transform Ottawa’s image for the city’s Night Mayor and planners.



DATA 5000

Poster Number 4

Multi-LSTM MACD Forecasting for Simulated ETF Trading

Matteo Sotelo

This study explores the application of Long Short-Term Memory (LSTM) neural networks to forecast MACD momentum signals for ETF trading, integrating both technical and macroeconomic indicators. By leveraging over 20 years of historical data from Yahoo Finance and FRED API, the model predicts the moving average convergence/divergence and the signal line (nine-day exponential moving average of the MACD) for the S&P 500 (SPY) and uses the output to generate trading signals. These signals are then executed in a custom backtesting framework and benchmarked against a buy-and-hold strategy. Through leveraging a large dataset, the system achieves a 113.16% return compared to a 33.05% buy-and-hold return on investment, showing the potential for combining machine learning models with vast economic context to support automated ETF investment strategies.



DATA 5000

Poster Number 9

#WOMAN_MOMENT: Measuring Habitability of Online Communities Through a Multimodal Data Science Lens

Kayleigh Lewis, Norman Patterson School of International Affairs (NPSIA), and Zeina Omar, School of Computer Science

This project investigates what makes an online space habitable, using feminist theory to frame habitability as the ability to safely exist, express, and engage on digital platforms. Guided by a subalternative science framework, we combine critical domain insights with machine learning techniques to model safety, moderation, and participation across Reddit and YouTube. We analyzed ~1 million Reddit posts and comments, as well as 43,000+ YouTube comments, scoring each with an ensemble of BERT-based models (RoBERTa, HateBERT, ToxicBERT) to detect both explicit and ambiguous harms. Our hypotheses explored three domains: [A] YouTube: Women-tagged content consistently received more toxicity, emphasis on HateBERT. [B] Reddit: Clusters with more restricted subreddits showed higher toxicity; bridge communities with higher centrality tended to be safer. [C] Composite Index: A weighted ensemble score better predicted moderation status and community stability than any single model. These findings reveal that habitability is measurable, multidimensional, and sensitive to social context. By tuning model weights and interpreting toxicity relationally, platforms can move beyond blunt moderation and toward adaptive, equity-focused design.



DATA 5000

Poster Number 2

Leveraging learned representations and multitask learning for lysine methylation site discovery

Francois Charih (Carleton), Kyle Biggar (Carleton), James R. Green (Carleton)

Lysine methylation is a post-translational modification of proteins that has been associated with a number of cancers. The full extent of the modification in the human cell is not known and there is great incentive to identify new lysine methylation sites, as this could uncover new anti-cancer drug targets. In this work, we investigate the use of protein language models representations and multitask learning to identify novel lysine methylation sites. We show that the use of protein language model-generated representations is associated with significant improvements in the accuracy of the predictions. We also demonstrate that prior knowledge about other post-translational modifications can be incorporated into the model to further enhance its performance. Altogether, we believe that our model will enable the discovery of a wide range of novel methylation sites, many of which are expected to be clinically relevant.



General

Poster Number 5

Genetic Modifiers of Treatment Responses in Duchenne Muscular Dystrophy

Fatemeh Ahmadiharchegani1, Eric P. Hoffman3, Daniele Sabbatini2, Elena Pegoraro2, Luca Bello2, Michela Guglieri4, Paula Clemens5, Utkarsh J. Dang1 1. Carleton University, Ottawa, Canada. 2. University of Padova, Italy 3. Binghamton University NY, USA. 4. Newcastle University, UK. 5. University of Pittsburgh School of Medicine, Pittsburgh, USA

Duchenne muscular dystrophy (DMD) is a fatal X-linked disorder that primarily affects boys, causing progressive muscle degeneration, growth stunting, and a shortened lifespan. Although many clinical trials focus on the second decade of life, our study uses data from the first decade to investigate the impact of genetic modifiers on treatment safety outcomes. We explored whether certain gene polymorphisms outside the DMD gene influence critical safety measures height, bone biomarkers, and cortisol levels. Due to dataset limitations, we focused on specific single nucleotide polymorphisms (SNPs) in the vitamin D receptor (VDR) gene and the PDGFD gene. Our results suggest that particular VDR polymorphisms may provide protective effects against the growth stunting and negative bone biomarker changes commonly seen with prednisone treatment. This highlights the potential of genetic profiling to optimize therapeutic strategies for young boys with DMD. Further research with larger cohorts is needed to confirm these findings.



General

Poster Number 6

NOW-CASTING JOB VACANCIES ACROSS CANADIAN INDUSTRIES

Mohammad Alipourlangouri, Zahra Mousavi, Ali Babapour

Due to the 40–50 day delay in the publication of labor market indicators by Statistics Canada, this project addresses the need for faster, real-time labor insights. We aim to now-cast job vacancy rates across 21 Canadian industries using (S)ARIMA(X) time series models.



General

Poster Number 7

Selectorate Theory: How Leaders Stay in Power?

Victor Li (Carleton University, School of Computer Science)

Selectorate Theory, developed by Bueno de Mesquita et al., offers a compelling framework for analyzing how political leaders maintain power through strategic resource distribution. The central hypothesis of this research is: How should leaders allocate their resources to maintain their power? According to the theory, the answer depends on the winning coalition's size relative to the selectorate. Leaders who rely on small coalitions tend to distribute private goods to ensure loyalty, while those needing broad support invest more heavily in public goods. This project uses a Python-based simulation to model leader behaviour across different regime types by varying the size of the selectorate and winning coalition. The model explores how leaders allocate constrained resources between private rewards and public investments, factoring in loyalty norms and revenue availability. The results reinforce the theory's predictions: leaders with small coalitions focus on private goods and enjoy prolonged tenure. In contrast, those with large coalitions prioritize public goods, leading to higher citizen welfare but increased political competition.



General

Poster Number 8

Timing the Continuum: An Agent-Based Model of Housing Continuums and their effects on Housing Insecurity

Malick Sylla, Carleton University

This honour's project set out to develop an agent-based model to investigate how construction timing across the housing continuum affects housing insecurity and homelessness. Despite adequate overall housing supply, the model's simulations reveal that delays in market ownership housing construction can create a "housing gridlock" effect that cascades through the entire continuum. This gridlock compounds pressure on lower forms of housing, effectively blocking out lower income households from accessing the housing affordable to them - resulting in increasing onsets of housing insecurity and homelessness. More work needs to be done to verify and nuance this emergent relationship, but the findings do suggest that effective housing policy may need to consider not just the quantity of housing built, but critically, when and in what sequence different housing types are delivered across the continuum.



General