Data Day Themes Banner Hero Image

Poster Abstracts


Poster Image
Poster
# 1

Marwan
Ghalib
Poster Number 1

A Federated Temporal Fusion Transformer for Residential Short-Term Electric Load Forecasting

Marwan Ghalib, Dr. Zied Bouida, Prof. Mohamed Ibnkahla

Privacy concerns arise from sharing residential electric load data. For instance, this data can be hijacked and deductions can be made about home occupancy. Anonymized data does not fully solve the issue, as there have been several successful attempts of re-identifying individuals from anonymized data. Federated Learning (FL) involves training a global model amongst several clients without sharing data, but instead, by sharing model weights. FL, therefore, preserves the privacy of the end users. FL also allows for gaining insights from data that researchers and industry typically would not get access to due to different privacy mandates. Therefore, training a common model using data from different jurisdictions is possible avoiding any inefficiencies to the machine learning process or loss of valuable learnings from training separate models. Maintaining privacy should come at the expense of a forecasting model's accuracy. While transformers have been traditionally used for Natural Language Processing (NLP), they have also shown improved accuracy for time-series forecasting. This can be attributed to the transformer architecture's ability to avoid long dependency issues, as well as their introduction of a self-attention unit, which along with positional embeddings, provides information about the relationship between the different data points. A Temporal Fusion Transformer is a type of deep learning model that combines the strengths of transformers and Long Short-Term Memory (LSTM) networks to handle time-series data more effectively. Through this research, a lightweight privacy-preserving training process is proposed to forecast residential electric load using a Temporal Fusion Transformer, Federated Learning, and Quantization.

Poster Image
Poster
# 2

Sam
Serdah
Poster Number 2

Predicting the 10-year risk of developing coronary heart disease (CHD) using Machine learning techniques

Sam Serdah and Darshak Patel

Introduction: According to the World Health Organisation (WHO), the prevalence of Heart disease is increasing worldwide, leading to more than 12 million deaths per year [1]. In most developing countries heart disease is conveyed to be the leading cause of death for 50% of the deaths [1]. An early intervention that assesses current lifestyle choices and their effects on developing heart disease is required to lower the prevalence of death due to heart disease[1]. This project aims to identify the most important heart disease risk attributes and use logistic regression to estimate the total risk. A dataset from Kaggle that contains attributes related to heart disease will be used to conduct this project [2]. Research Question: Predict whether the patient has a 10-year risk of future coronary heart disease (CHD) or not based on 15 attributes which include demographic, behavioural and medical risk factors. Methodology: This is a six-step process for data analysis. The first step is data cleaning, which involves removing inaccurate, damaged, improperly formatted, duplicate, or missing data from the dataset. The second step is descriptive statistics, which involves summarising features from a collection of information using summary statistics. The third step is statistical inference, which involves hypothesis testing with statistical methods and providing significance, such as logistic regression. The fourth step is a graphical representation, which involves visualizing data using graphs and charts. The fifth step is result interpretation, which includes a statement of the main conclusions, an analysis of those results, and the review's advantages and disadvantages. The sixth and final step is conclusions, which involve summarising all results of the analysis and writing conclusions based on them. Expected Outcome: The expected results from this project are to create a model that would predict whether an individual is currently at high risk or not and whether they will develop a CHD within the next 10 years. References: 1) “Cardiovascular diseases (cvds),” World Health Organization. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardi ovascular-diseases-(cvds). [Accessed: 08-Dec-2022]. 2) Ganteng, C. (2020, September 22). Cardiovascular study dataset. Kaggle. Retrieved October 15, 2022, from https://www.kaggle.com/datasets/christofel04/cardiovascular-study-dataset-predict-heart-disea

Poster Image
Poster
# 3

Mohamed
El Shehaby
Poster Number 3

Can we fool the defender? The Practicality of Adversarial Attacks Against Network Security

Mohamed el Shehaby, Ashraf Matrawy, Systems and Computer Engineering Carleton University, School of Information Technology Carleton University

Machine Learning (ML) applications are present in every corner. ML found its way to network security applications due to its (1) automated nature and (2) ability to process and classify high volumes of data with high accuracy. However, one massive ML challenge loomed on the horizon; adversarial attacks. There is no doubt that these attacks are a menace to ML. Nevertheless, due to the settings and details surrounding network security entities, unlike computer vision, the practicality of adversarial attacks against some network security systems might be questionable. Therefore, we present our work-in-progress; a taxonomy of the practicality issues of adversarial attacks against network security. Then we apply our taxonomy to multiple ML network security entities; Network Intrusion Detection systems, spam detection systems, malware detection systems, and phishing detection systems, to answer the question: when should we worry about adversarial attacks in the network security domain?

Poster Image
Poster
# 4

Dana
Haj Hussein
Poster Number 4

Designing for the Unknown: Paving the Way Towards Intelligent and Autonomous Resource Management in Future Internet of Things Systems

Dana Haj Hussein (Ph.D. Candidate), Professor Mohamed Ibnkahla

There is a consensus among the research and industry fields that the need for intelligent resource management algorithms is inevitable to address the foreseen complexities of future Internet of Things (IoT) systems. Nevertheless, IoT data scarcity remains one of the main challenges rendering the fast adoption of intelligent solutions. Indeed, IoT traffic data is invaluable for training intelligent Machine Learning (ML)-based algorithms. However, large scale deployment of IoT networks, such as smart cities, is still ongoing. In addition, sharing IoT data could be hindered due to the privacy concerns associated with their use. The contributions of this research are twofold. First, a novel IoT traffic modeling framework, namely, the Tiered Markov Modulated Stochastic Process (TMMSP), is proposed to address IoT traffic data scarcity. Particularly, real IoT data traffic is studied to develop the TMMSP framework which captures the traffic characteristics of real IoT systems. Consequently, the TMMSP model can be used to generate synthetic traffic datasets that mimic the behavior of real IoT traffic and can be used for training ML algorithms. Second, an intelligent and autonomous edge resource slicing (IAES) that uses reinforcement learning is proposed to dynamically optimize resource allocation on edge enabled IoT networks. The IAES utilizes IoT traffic data generated by the TMMSP framework to train its neural networks. In a simulated environment, the IAES algorithm has achieved around 40%, 90% and 93% reduction in the system’s running cost compared to two variations of dynamic-based as well as static-based resource allocation models, respectively.

Poster Image
Poster
# 5

Parul
Tambe
Poster Number 5

Analyzing Economic Conditions to Predict Loan Default ​

1. Chris Fitzpatrick Norman Patterson School of International Affairs​ chrisfitzpatrick@cmail.carleton.ca​ Student ID: 101019119 2. Omar Imran​ Systems and Computer Engineering​ omarimran@cmail.carleton.ca Student ID: 101100882 3. Myself (info provided above)

The aim of the project was to build a machine-learning model that could be used to forecast whether a loan is at risk of default based on evolving economic conditions and inform consumers about the risk associated with loans they have. Changing economic conditions such as the unemployment rate, Consumer Confidence Index (CCI), Gross Domestic Product growth rate(GDP), and inflation rate can affect an individual and a business’s ability to make their loan payments leading to an increase in observed loan defaults. The dataset for this project is compiled from two sources: Lending Club, the primary dataset providing peer-to-peer lending (P2P) loan information; and the Federal Reserve Economic Data (FRED), the auxiliary datasets that provide information about the economic conditions throughout the period of the loans. Two different dimensionality reduction techniques were used: Principal Component Analysis (PCA), and Random Forest (RF). Three different algorithms were chosen: K-Nearest Neighbours (KNN), Decision Trees (DT), and Artificial Neural Networks (ANN) to classify the loans. The ultimate goal was to compare the different combinations of dimensionality reduction techniques and algorithms to select the best pair. Overall, outside economic factors were shown to be of use when predicting loan default and delinquency. These economic features contributed to building a model with high recall, accuracy, and precision. Through the features identified, this project confirms that outside economic factors do have an impact on the repayment of loans. Understanding the specific mechanisms for how these facts impact loan repayment is an area for further research.

Poster Image
Poster
# 6

Mahitha
Sangem
Poster Number 6

Prediction of water potability

Hannah Johnston - PhD student, CSIT – Research focus on UX for Human-AI art collaboration, Carleton University. Mahitha Sangem - MEng student, SCE – Data Science Specialization, Carleton University.

Safe drinking water is important for everyone. We built a binary classifier to recognize water potability (determine whether or not water is potable). In addition to potability, the dataset contains the following water quality metrics (features): pH value, hardness, solids (total dissolved solids), chloramines, sulphate, conductivity, organic carbon, trihalomethanes, and turbidity. The dataset contains 3276 different water samples. We preprocessed the data to handle missing values, visualise features, and split and scale the dataset. To address an observed data imbalance, we up-sampled using ADASYN. To perform the water potability prediction, we implemented several classical machine learning models, including KNN, decision tree, SVM, and Logistic regression. We improved performance through the use of ensemble techniques stacking and voting, using the top performing classifiers as base learners. We also implemented several deep learning models. Each of the classifiers was implemented with hyperparameter tuning alone and with ADASYN plus hyperparameter tuning. A majority of the  classifiers worked better with ADASYN usage. We visualised the final results and compared the F1-scores, precision, recall and accuracy. F1-score was prioritized as both false negatives and false positives can have serious negative consequences, leading to either wastage of potable water, or health problems due to drinking water unfit for consumption.

Poster Image
Poster
# 7

Sami
Ortiz Huayhua
Poster Number 7

Tik Tok Privay and Young Adults

Sami Ortiz Huayhua. Supervisors: Robert Biddle, Sonia Chiasson

Social media platforms such as TikTok permeate the daily lives of young adults. The pandemic shifted our social interactions to the digital world popularizing social media platforms as aids to work, study, and socialize. In this context, TikTok has become the dominant space for audio-visual content. Through semi-structured interviews and questionnaires with 12 content creators and 13 content followers, we explore young adults’ perceptions, behavior, and their methods to manage privacy on the social media platform TikTok. Using thematic analysis, our findings identify that TikTok has been adopted into routine conversations to shape culture outside the platform. The results also highlight that young adults view data collection as a routine procedure, to which they are desensitized. However, they express concerns around surveillance and cross-platform tracking. Participants employed privacy-enhancing behaviors to address fear of other users and to avoid possible targeted negative reactions on the app.

Poster Image
Poster
# 8

Tariq
Elbahrawy
Poster Number 8

Ottawa Client Experience Chatbot

Tariq ElBahrawy Dharanidhur Gonela Egahi Gideon Emmaogboji

Introduction: The client experience team for ServiceOttawa wanted to build a chatbot for the Ottawa.ca website. As a result, we have teamed up with them to create a chatbot that answers and guide resident questions on the Ottawa.ca website. The dataset used in this project will be created manually by writing questions and answers from two sections of the Ottawa.ca website. Then, the chatbot will classify these questions using a deep learning model and provides an answer to the user. Methodology: The dataset is created by writing questions and answers manually from two section of the Ottawa.ca website and store the questions in json file. The file is formatted as follows. It contains tags(classes) such as greeting, a pattern (what the user will type) such as, “Hi”, and a response (the chatbot response) such as, “"Hi there. How are you feeling today?"”. Until now we have written 200 questions and answers from one section of the website and we are working on writing questions for one more section of the website. Then we have done the following: • Used Natural Language Processing (NLP) techniques to separate patterns (what the user will type) such as “Hi” into lists of 0’s and 1’s (bag of words) for our deep learning model. • Then we trained our deep learning model to classify these sentences. For example, if the user types “Hi” it would classify it as greeting and gives a response from the greeting dictionary in the json file. Results: Our model classifies patterns (what the user will type) with 99.5 accuracy and 0.00425 loss.

Poster Image
Poster
# 9

Andre
Telfer
Poster Number 9

Automated Analysis Pipelines in Behavioral Neuroscience

Andre Telfer (1), Brenna McAuley (1), Thea Jewer (1), Andrea Smith (1), Frances Sherratt (1), Dr. Vern Lewis (1), Dr. Shawn Hayley (1), Dr. Argel Aguilar-Valles (1), Dr. John Lewis (2), Dr. Oliver van Kaick (3), Dr. Alfonso Abizaid(1); 1: Carleton University Department of Neuroscience, 2: Carleton University Department of Computer Science, 3: University of Ottawa Department of Biology.

In the 21st century, understanding the human brain remains a distant challenge with wide-ranging impacts across fields in the health sciences, social sciences, and Artificial Intelligence. Mouse models have been an important method for researchers to study human neurological traits and disorders. These studies can produce massive amounts of multimodal data that are often quantified manually over the course of months. In the past decade, deep learning analysis of image and video data has reached competitive accuracy with human scoring, while taking significantly less time and reducing sources of bias. While the accessibility of deep learning has greatly improved, creating novel pipelines for domain-specific research experiments can be a significant hurdle for labs. This poster highlights our progress in developing data capture systems and deep learning pipelines to study mouse models of human neurological traits and disorders. We focus on the analysis of traditional behavioral paradigms, gait analysis, facial expression analysis, and microscopy data. Additionally, we discuss future directions using semi-supervised features for action classification, synthetic training datasets, and Inverse Reinforcement Learning modeling of motivation.

Poster Image
Poster
# 10

Krishna Phanindra
Valivety
Poster Number 10

Analaysis of Factors Impacting House Pricing

Farukh Jabeen, Sandeep Inapanuri, Krishna Phanindra Valivety

In the retail housing business, it is always interesting to learn about the people who buy homes. Given there are many options available for the buyers, it is always hard for property builders and managers to design homes or rentals that meet every other customer's needs. It is perceived that the size of the home is always directly proportional to the price of the house; however, the size may not be the only factor determining the price. There might be many independent variables that influence the price of a house. The real estate industry has been steadily growing and is now a major sector of the economy at large. But in recent years, urban property values in our nation have risen quickly overall, pushing prices well over the means of the majority of citizens. Therefore, the issue of rising property prices progressively became a severe economic and social issue that negatively impacted the quality of life for locals. Our objective is to fetch the data for housing prices, pre-process the data and apply various algorithms to compare the results from each algorithm on two different data sets and derive to a conclusion on the impacting parameters on the housing prices. After careful consideration and review of the data, we will prepare it and clean it so that we may have a curated form of the data set for further analysis. We plan to do engineering to cover the gaps in data and prepare the desired variables which may influence the pricing of the house. We will perform exploratory analysis. Since the to have the robust models, we will split the data into test and training set. We will use machine learning algorithms such as: 1. Random forest, 2. Support Vector Machine 3. XGBoost Regressor to prepare predictive models as well as comparative analysis. Results : Comparison of random forest, Support vector machine and XGBoost. XGBoost regressor gives us the best result as compared to RF and SVR. The higher values of root mean square implies the higher accuracy of model. While the higher value of R square is considered desirable for prediction. The XGBoost have higher RSME value and R Squared value, which suggested well fitted model and predicted the value in absolute terms Price is highly correlated with year built and square feet lot. This suggested that most expensive houses are ones built after 1990 and with bigger lot area. The houses built after 1990 were sold at higher prices compared to earlier years.The houses with condition value of 3 and 5 were sold at higher price. Higher number indicates the new factor.

Poster Image
Poster
# 11

Yousef
Rafique
Poster Number 11

Container-based Edge Video Processing for Smart Traffic Control in Carleton-Cisco’s IoT Testbed

Yousef Rafique and Mohamed Ibnkahla, Internet of Things Lab, Systems and Computer Engineering Department, Carleton University

Traffic congestion is a major contributor to carbon dioxide (CO2) emissions, and traffic signals play a crucial role in exacerbating this issue. A Helsinki-based study reveals that Intelligent Traffic Management Systems (ITMS) can potentially reduce CO2 emissions by up to 30%. As such, this poster presents an innovative, easy-to-deploy, scalable, and cost-efficient solution for smart traffic control. The ITMS application is one of the use-cases deployed on the Carleton-Cisco IoT testbed. The primary objective of the ITMS application is to minimize idle wait times by dynamically controlling traffic flow. Cameras installed at intersections provide live feeds of the current traffic, and the system uses image processing techniques to count the number of cars in each direction and lane. By leveraging real-time vehicle volume data at each intersection, traffic lights can be timed accordingly. Lightweight container technology is utilized to package the application to cater to IoT's stringent resource requirements, resulting in a portable video processing application that can be deployed at the network Edge. By processing data where it is generated, the ITMS application offers a low latency solution to traffic congestion without the need for Cloud or Internet connectivity, making it a practical solution for urban traffic management.

Poster Image
Poster
# 12

Andrea
Payne
Poster Number 12

Improving Coherence of Non-Negative Matrix Factorization (NMF) Grouped Topic Models by Comparing tf-idf to "Bag of Words"

Andrea Payne, School of Mathematics and Statistics, Carleton University

Introduction: There are two primary methods for topic modelling within text analysis: probabilistic topic models and non-negative matrix factorization (NMF). This project focuses on NMF, as recent advancements have lightened the computational requirements and allowed for inference, like permutation testing and bootstrap sampling. This inference ability is significant as, in general, inference remains computationally demanding. However, current research uses a "bag of words" (BOW) approach to build the initial term-document matrix (TDM). This approach is antiquated in other applications and underperforms compared to other term representations. As such, this project explores how the coherence of topics improves by utilizing term frequency-inverse document frequency (tf-idf), a simple but effective weighted term count which boosts the weight of rare words while penalizing common jargon. Methodology: This project focuses on an application-based approach by utilizing data from 2,225 BBC news articles published in 2004-2005. Two models are fit, one with a BOW-informed TDM and the other with tf-idf, to identify the main topics in these articles. The models are compared by their coherence measure, that is, how interpretable the topics are to humans. Two main limitations to this approach exist. First, selecting the number of topics remains subjective, as it is decided before fitting the model. Second, this project only utilizes new articles due to time constraints. Results: This project found that the TDM built with tf-idf weights increased the coherence of topics by 88% from that of the BOW frequencies. Future work includes utilizing more datasets and improving data cleaning methods.

Poster Image
Poster
# 13

Samuel
Egan
Poster Number 13

Know Your Risk: Heart Disease Predictive Modeling based on Behavioral Health Risk

Akhand, Rayyan¹, Di Lorio, Katherine², Egan, Sam³ ¹ M.A.Sc Biomedical Engineering ² M.Sc Biology ³ M.Cog.Sc.

Cardiovascular disease (CVD) is the leading cause of mortality worldwide, resulting in the deaths of 17.9 million individuals annually, according to the World Health Organization. Early detection of CVD by identifying important risk factors can significantly improve patient outcomes. This study aims to classify individual risk of CVD based on behavioral health factors using Naïve Bayes and Random Forest machine learning (ML) algorithms, and determine the most influential factors among different cohorts. If successful, ML could prove to be an invaluable tool in the field of medicine and bioinformatics. There are no known risks in this study as our research solely involves secondary use of anonymized data. The dataset was retrieved from the 2021 USA BRFSS survey by the CDC. We mined the publicly available data, prepared it, then trained and tested ML models all using Python3. The most relevant features contributing to the algorithm’s prediction of CVD were identified using Gradient Boosting feature selection. The original survey was taken from a diverse background with varying demographics which increased the model’s generalizability. However, the dataset was severely imbalanced; thus four different sampling methods were implemented and compared to handle this limitation. Our results showed the best performing model to have an accuracy of above 95% among all cohorts. In future research we hope to see extended comparison between different models, as well as an effort to transform the algorithm into a publicly available tool to allow individuals to consistently check their ongoing risk for CVD and act proactively.

Poster Image
Poster
# 14

Bingjun
Tang
Poster Number 14

Predicting Canadian Annual Income Through Census Data

Puxin Shi, Wenyan Luo, Bingjun Tang Master of Engineering: Data Science and Analytics, Master of Business Administration, Master of International Affairs

We attempt to build a predictive model for Canadians’ income level using data obtained from the 2016 Census of Population Public Use Microdata File, as well as distinguishing key determinants of income in the Canadian context. Our work can potentially aid policymakers in identifying low-income populations from measurable socioeconomic characteristics and designing corresponding policies that address the underlying factors of income inequality. Since measures that could implicate respondents’ confidentiality are already reduced in detail, no foreseeable ethical concerns are present. The dataset includes 930,421 observations and 123 individual and family-related socioeconomic features. An estimated weight is also attached to each observation to ensure the sample’s representativeness of the population. 27 features judged to be the most relevant in determining income on a theoretical basis are selected for predictive modeling after undergoing one hot encoding transformation. Using version 2022.12.0+353 of Rstudio, multiple regression, random forest, and naive Bayes are used to fit the data, with sample weights passed into each model’s weight parameter. Unusable data amounting to 495,643 observations could lead to biased results, as they are unlikely to be missing at random. Results demonstrate that while multiple regression achieved an R2 of 0.227, random forest and naive Bayes returned accuracy scores of 0.64 and 0.624 respectively when income is divided into five categories based on the 2015-2016 income tax bracket. Mean decrease in gini importance generally shows age as the predominant determinant of income, followed by industry sector, the field of study, and the number of rooms in dwellings.

Poster Image
Poster
# 15

Mohamed
Basyouni
Poster Number 15

Analysis of China's GDP recovery after the COVID-19 pandemic

Mohamed Basyouni

This report presents an analysis of China's GDP recovery after the COVID-19 pandemic and the development of a machine learning model to predict the GDP of Chinese cities in the post-COVID era. The COVID-19 pandemic severely affected the Chinese economy, causing a significant decline in the country's GDP. However, China has shown remarkable resilience in its response to the pandemic and has managed to revive its economy successfully. This report analyzes the measures taken by the Chinese government to revive its economy and the impact of those measures on GDP growth [1]. We will study how the changed economic structure supported China's survival in the coronavirus pandemic [2]. We want to study what became the backbone of China's national economy after the coronavirus pandemic [3]. Furthermore, this report proposes a machine learning model that predicts the GDP of Chinese cities for the post-COVID era [4]. The model was developed using a dataset of various economic indicators, including industrial production, fixed asset investment, and retail sales. The model uses advanced statistical techniques to identify the significant economic indicators and their impact on GDP growth.

Poster Image
Poster
# 16

Atamson
Atam
Poster Number 16

Twitter Troll Detection: A Language Model Approach

Atamson Atam -> Norman Paterson School of International Affairs Mohamed Elbeltagi -> Physics Department Deepro Sengupta -> Sprott School of Business

Foreign digital interference via online social media trolling may serve as a strategic tool to spread misinformation. This practice could influence public reactions to social events, policies, and even democratic processes such as election outcomes. Interference investigation into the 2016 US presidential elections, traced some 200,000 tweets from about 3000 troll accounts to an Internet Research Agency (IRA) in St. Petersburg, Russia. Can language transformer models be trained to detect and possibly mitigate against the spread of political troll tweeting? We utilized two datasets: - Troll dataset: 100k confirmed troll tweets (Russian IRA produced tweets, 2016) - Non-troll dataset: 100k general political tweets from 2020 elections (excluding any originating outside the US). We tokenize and encode tweet texts, then pass them through a pre-trained transformer to obtain text features which are used to train a classifier head to output the confidence in the tweet belonging to the troll or non-troll class. Three distinct language models ((1) DistilBERT, (2) Fine-tuned TinyBERT, (3) SetFit) were trained on a combined dataset of Troll & Non-Troll, with an 80%-20% train-test split. The results obtained after training: DistilBERT : classification accuracy of ~92%, Fine-tuned : classification accuracy of ~98%, SetFit : classification accuracy of ~80%, The fine-tuned transformer achieved the highest accuracy despite fine-tuning a smaller (more distilled) transformer than the first approach (due to GPU memory constraints). SetFit achieved good accuracy despite being trained on much less training data than the other 2 models.

Poster Image
Poster
# 17

Rebecca
Tobin
Poster Number 17

Reliability analysis of clinical outcomes for a rare neuromuscular disease

Rebecca Tobin, Eric Hoffman, Michela Guglieri, Paula Clemens, VBP15-LTE and VBP15-004 investigators, Utkarsh Dang; Carleton University, Binghamton University, Newcastle University, University of Pittsburgh School of Medicine, NA, Carleton University

Background Duchenne muscular dystrophy (DMD) is an X-linked genetic disorder affecting approximately 1 in every 3600 male births. DMD causes progressive muscle degeneration, loss of motor function, and ambulation. Two clinical trials provided two independent, well-controlled cohorts with harmonized outcomes and protocols. Understanding the reliability of commonly used outcomes is important to distinguish change from variability. Objective To analyze test-retest reliability of outcomes using pre-treatment (screening and baseline) measurements. Methods Clinical outcomes were either proxies of function (e.g., stand from supine velocity) or more direct measures of muscle strength (e.g., knee extension). Two different techniques were used for myometry in the two trials (CINRG Quantitative Measurement System [CQMS] vs. MicroFET2 handheld digital muscle dynamometer) cohorts. We used Bland-Altman analysis, the intraclass correlation coefficient (ICC), and coefficient of variation (%CV). Results The ICC ranged from good to moderate for functional tests and from moderate to poor for myometry outcomes. Reliability was superior for the dynamometer as compared to CQMS when outliers were removed; however, the confidence intervals were wide and overlapping. Reliability decreased with younger age for two functional tests. Trends in the %CV did not always agree with trends in the ICC. Conclusions Two cohorts of boys with DMD yielded poor to good reliability for commonly used clinical outcomes. A nuanced comparison of hand-held myometry measurements vs. CQMS emerged. Age, outliers, and test difficulty were found to influence reliability but not days between repeated measurements. Our findings can be used to reduce study burden on boys with DMD.

Poster Image
Poster
# 18

Andre
Telfer
Poster Number 18

Automated Analysis Pipelines in Behavioral Neuroscience

Andre Telfer (1), Brenna MacAuley (1), Thea Jewer (1), Dana Wymark (2), Andrea Smith (1), Frances Sherratt (1), Abbie Smith (1), Rhishita Mondal (3), Geoffrey Datema, Emmerson Borthwick (1), Dr. Vern Lewis (1), Dr. Amedeo D’Angiulli (1), Dr. Shawn Hayley (1), Dr. Argel Aguilar-Valles (1), Dr. John Lewis (4), Dr. Oliver van Kaick (3), Dr. Alfonso Abizaid (1); 1: Carleton University Department of Neuroscience, 2: Carleton University Department of Cognitive Science, 4: Carleton University Department of Computer Science, 4: University of Ottawa Department of Biology.

In the 21st century, understanding the human brain remains a distant challenge with wide-ranging impacts across fields in the health sciences, social sciences, and Artificial Intelligence. Mouse models have been an important method for researchers to study human neurological traits and disorders. These studies can produce massive amounts of multimodal data that are often quantified manually over the course of months. In the past decade, deep learning analysis of image and video data has reached competitive accuracy with human scoring, while taking significantly less time and reducing sources of bias. Though the accessibility of deep learning has greatly improved, creating novel pipelines for domain-specific research experiments can be a significant hurdle for labs. This poster highlights our progress in developing data capture systems and deep learning pipelines to study mouse models of human neurological traits and disorders. We focus on the analysis of traditional behavioral paradigms, gait analysis, facial expression analysis, and microscopy data. Additionally, we discuss future directions using semi-supervised features for action classification, synthetic training datasets, and Inverse Reinforcement Learning modeling of motivation.

Poster Image
Poster
# 19

Amy
Resmer
Poster Number 19

Nothing happens until something moves. (Einstein) --- Analyzing Canadian Census Commuter Data

Ksenia Ekimova (Master of Public Policy & Administration with a specialization in Data Science), Oz Kilic (Master of Computer Science), Amy Resmer

The aim of this study is to identify areas with high public transit commuter activity and determine if these areas are occupied by a high percentage of lower socioeconomic families. These findings are analyzed to provide support for transit subsidies and public transit route recommendations. The motivation for this study developed from the importance of examining the socioeconomic factor of income and its influence on public transit utilization in differing communities. Socioeconomic status determines an individual’s quality of life and so we find it is essential to study the various factors and their influence on the standard of living. Location-wise aggregated responses to Statistics Canada’s 2016 Census of Population, will be used to identify significant commute flows and low-income populations. Identification of these areas will lead to recommendations on the viability and impact that public transit subsidies can have, and support informed recommendations to cities on specific areas for service improvements. Our findings did not reveal negative correlations between socioeconomic status and the use of transit and length of one’s commute. We believe this is due to limitations within our dataset that include low granularity and the absence of paired data. Future research would focus on finely-grained, paired data to support our hypothesis.

Poster Image
Poster
# 20

Wonhyeong
Chae
Poster Number 20

Comparative Analysis of Different Models for Predicting Canadian Consumer Price Index (CPI)

William Chae, Eric Jamieson, Jessie Yin, Majid Komeili, Carleton University, Carleton University, Carleton University, Carleton University

As inflation becomes an increasingly pressing issue for Canadians, more research is required in order to more accurately forecast price changes to consumer goods in Canada. While there has been research into identifying which variables might be the most impactful in predicting Canadian CPI, there has not yet been a comparative analysis which sets out to determine the most accurate modeling methodology for predicting Canadian CPI. This project adds to the existing research on predicting Canadian CPI by identifying which types of models yield more accurate prediction results, as well as identifying which variables are the most important in predicting CPI across models. This project investigates the ability of recurrent neural networks (RNNs), specifically long short-term memory (LSTM) models and gated recurrent unit (GRU) models, as well as the gradient boosting framework XGBoost, support vector regression (SVR), and the more traditional models of vector autoregression (VAR) and seasonal autoregressive integrated moving average, with and without exogenous factors, (SARIMA/SARIMAX) to predict CPI. The primary dataset is the monthly percent change in Canadian CPI, which is the label to be predicted. Features used to predict percent change in CPI include a collection of indices of farming products, building materials, raw materials, imports/exports, percent change in the M2++ money supply, and historical CPI. Grid searches were performed to find the optimal hyperparameters, according to lowest root mean square error (RMSE). The final RMSE on the test dataset is lowest for SARIMA (0.0046), followed by SARIMAX (0.0052).

Poster Image
Poster
# 21

Harsh Ashokbhai
Patel
Poster Number 21

Predicting the Risk of Developing Diabetes Using Machine Learning Models

Harsh Ashokbhai Patel, Deepkumar Kiritkumar Patel, Parvaneh Jalerajabi, Majid Komeili, Carleton University, Carleton University, Carleton University, Carleton University

Introduction: Diabetes is a chronic medical condition characterized by high blood sugar levels. It affects millions of people worldwide and can lead to serious health complications if left untreated. Early detection and management of diabetes can significantly reduce the risk of complications and improve the quality of life for individuals with the condition [2]. Machine learning models have emerged as a promising approach for predicting the risk of developing diabetes. These models use statistical algorithms to analyze large datasets and identify patterns and correlations between various risk factors and the likelihood of developing diabetes Several risk factors have been identified as contributing to the development of diabetes, including age, weight, family history, lifestyle factors, and medical history. Machine learning models can be trained on datasets containing these risk factors and other relevant data to predict an individual's risk of developing diabetes [3]. The results obtained from the study can aid healthcare professionals in identifying high-risk individuals and developing targeted interventions to prevent or delay the onset of diabetes. Methodology: The methodology involved several steps. First, the dataset was preprocessed to remove missing values, normalize the data, and encode categorical variables. Then, the dataset was split into training and testing sets for model development and evaluation, respectively. Next, the study applied various machine learning algorithms, such as Random Forest, Logistic Regression, Decision Tree, and Support Vector Machine (SVM) to develop predictive models for diabetes risk prediction. The models were evaluated using various performance metrics, such as accuracy, precision, recall, and F1-score, to determine their effectiveness in predicting diabetes risk. Dataset and Limitations: A comma-separated-values file (.csv) from the imbalance dataset [1] from Kaggle, which comprises 253,680 survey responses, was used for this project. Additionally, it includes goal variables for each of the three stages: 0 denotes no diabetes or diabetes that exclusively occurs during pregnancy, and 2 denotes diabetes. This data set has various features. which helps to predict accurately of our dataset. Some of them are high cholesterol, High Blood Pressure, Body Mass Index, Smoker, Stroke, heart disease or Attack, Physical Activity, and Fruits. The dataset was collected in 2015, which may not accurately reflect the current state of diabetes diagnosis and management, given the advancements in technology and changes in socio-economic factors since then. Conclusion: The Diabetes Health Indicator Dataset can be used to develop predictive models for classifying individuals as having diabetes or no diabetes. By applying machine learning algorithms to this dataset, researchers can improve early detection and intervention of diabetes, leading to better management of the disease and its associated health outcomes. Reference: 1. Teboul, A. (2020). Diabetes Health Indicators Dataset. Retrieved from https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset 2. American Diabetes Association. (2021). Standards of Medical Care in Diabetes. Retrieved from https://care.diabetesjournals.org/content/44/Supplement_1 3. Zhu, Y., & Jiang, X. (2020). Machine learning in diabetes research. Journal of Diabetes Investigation, 11(5), 1115-1125. https://doi.org/10.1111/jdi.13211

Poster Image
Poster
# 22

Surya
Alagathi Ekantharajan
Poster Number 22

Using U-Net architecture to achieve image segmentation and classification of glioblastoma tumour grade features

Surya Alagathi Ekantharajan, Qiushou Jia, Venkata Krishna, Majid Komeili, Carleton University, Carleton University, Carleton University, Carleton University

Gliomas are one of the most common type of tumours that have been classified as most aggressive and dangerous as far as brain tumours go. Almost 11 – 13% of all brain tumors are classified as gliomas. This project looks at only High-grade Gliomas (HGG) and Low-grade Gliomas (LGG). The MRI image once fed into the program will get pre-processed, segmented using the CNN architecture U-net, and a segmented tumor image along with consolidated features as data will be fed to a classifier. This project aims to reduce the time and mental load taken for the segmentation, classification and feature extraction process, and help in the efficiency, swiftness and accuracy of the diagnosis. The entire project is divided into classification and segmentation. The dataset used for segmentation is BraTs-dataset. This has around 371x4 3D MRI images and mask images containing various portions of the tumor manually segmented by expert radiologists. For the segmentation process, over a sp an of 100 epochs, the U-net algorithm begets 97% and 98% validation accuracy and training accuracy respectively. This project uses the BraTs-2020 dataset for glial tumor segmentation and BraTs-2018 for classification. For testing purposes, BraTs-2019 is to be extensively used. For classification, we get 98% training accuracy and 90% testing accuracy. The potential of the project can be further expanded upon by implementing a Graphical User Interface. This will make our project more user-friendly. Accuracy being the pivotal aspects of the project, can be improved by increasing the training dataset and increasing the epochs of over 100 whilst decreasing the learning rate, which increases the validation and training accuracy.

Poster Image
Poster
# 23

Catherine
Monteith-Pistor
Poster Number 23

Forecasting Battle Scenario Outcomes and Mission Success using MANA

Aidan Lochbihler, Artur Kivilaht, Catherine Monteith-Pistor, Ahmed El-Roby, Carleton University, Carleton University, Carleton University, Carleton University

The ongoing conflict in Ukraine, with Russia's indiscriminate targeting of civilian infrastructure, has highlighted the importance of adequate air defence (AD) against various threats. An improved understanding of such threats would help plan resource allocation and tailor self-defence mechanisms. This study aims to test the impact of various parameters on AD mission effectiveness using artificial data that approximates the real world in the agent-based model Map-Aware Non-linear Automata (MANA). Through the positioning of agents (e.g. radar towers, missile launchers, hostile actors) in MANA, 28,000 simulated scenarios were constructed to generate the necessary dataset for measuring parameter impact on mission success. Machine learning (XGBoost) was used to process critical input parameters and to predict the number of allied causalities in each scenario. SHAP then provided insight into the parameters that had the greatest impact on model outcome. Data farming presents a significant advantage over field-collected data due to its clean state. However, due to time and computer processing limitations, only six key parameters were analyzed. While these parameters have been identified by defence experts as most likely to impact mission success, testing additional parameters to verify results would be valuable. Predicting the impact of key parameters such as reload time or detection range capabilities of missile launchers will allow us to provide insight into the effectiveness of operational self-defence concepts. Ultimately, simulated battle scenarios in MANA could support real-time decision making without the costs of resources associated with live trials.

Poster Image
Poster
# 24

Afsoon
Khodaee
Poster Number 24

Computer image analysis of placental histology images for cardiovascular risk screening

Carleton University

Background: Placental histopathology images (PHIs) helps identify various placental abnormalities that can be associated with various disorders of fetus and mother. Lesions of placental maternal vascular malperfusion (MVM) have been associated with postpartum cardiovascular (CV) risk, suggesting the potential of PHIs analysis for CV risk screening. however, assessment of PHIs has relied on specialized perinatal pathologists, is time-consuming, and subject to intra- and inter-observer variability. Automating the PHIs analysis using advanced machine learning methods can improve the efficiency of referral to specialized postpartum clinics for interventions to mitigate CVD risk. Objective: In this study I am developing deep learning algorithms on PHIs to detect placental abnormalities and identify women with high lifetime CV risk. Methodology: PHIs present tissue phenotype which can be quantified with computational methods such as deep learning. However, analyzing this type of images are problematic due to various challenges in automating PHIs analysis. The main research challenges are: 1) the diversity of biological structures (e.g., size, shape) in histology images, 2) the heterogeneous presentation of MVM lesions, and 3) the large size of gigapixel images with high detail of histology whole slide images (WSIs). I am focusing weakly supervised learning based on attention and multi-resolution approach to address the challenges. Through established research collaborations, I have access to hundreds of archived placental specimens with CV risk assessment. Results/conclusion: I currently implemented an attention-based weakly supervised learning to account for the heterogeneity of MVM lesion presentation. Using this approach, a high accuracy (91%) has been obtained. Research outcomes can in the future be applied to remove the bottleneck of issues associated with the human analysis of on medical images as well as PHIs, which would allow CV risk screening to become standard postpartum clinical practice and greatly impact women’s health.

Poster Image
Poster
# 25

Anamarie
Gennara
Poster Number 25

Frost Watch: Detecting and Preventing Frozen Water Pipes

Eren Egitman, Anamarie Gennara, Abdul Mutakabbir, Olga Baysal, Carleton University, Carleton University, Carleton University, Carleton University.

The City of Ottawa has one of the largest interconnected water systems in Canada with 3,000 km of water mains coverage [1]. When freezing temperatures are expected during the winter months, the City of Ottawa leverages weather data to issue Let Water Run (LWR) notices in the hopes of preventing frozen pipes. However, $150,000 was spent in 2022 to mend frozen pipes, thus, still incurring a great financial cost to the City and the general public [3]. This project aims to minimize expenses by leveraging predictive modelling and other data science methodologies. Preliminary findings suggest copper pipes, seasonal fluctuations, and rural settings are key factors to consider in frozen water service management.

Poster Image
Poster
# 26

Gyse Martine
Jean Louis
Poster Number 26

Assessing Parametric Dependence for Threat Interception in Air Defence Simulations

Minuka Hewapathirana, Gyse Martine Jean Louis, Arka Singh, Majid Komeili, Carleton University, Carleton University, Carleton University, Carleton University

Global military conflicts have continuously highlighted the importance of effective air defence in saving lives during times of crisis. Our project aims to optimize air defence (AD) effectiveness by assessing important variables and applying predictive modelling over the farmed data to assess the performance beyond simulation software ranges to be further extended in various global AD systems. In the first phase of the project, we farm data from 2500 scenarios varying 4 parameters in the MANA software viz. – i) missile speed of threat missiles, ii) reload time iii) sensor range and iv) shooting range of the AD systems. We designed a measure of Combined Mission Effectiveness (CME), which is a weighted combination of the two output parameters, the number of missiles destroyed, and the number of targets hit by the missiles. Amounting to the success of Extreme Gradient Boosting (XGBoost) in regression predictive modelling, it is implemented using the farmed data to find the best parameter ranges for optimal CME. Our analysis shows the significance of reload time of the AD and we can expect the best mission outcomes for lower reload times. Further, the sensor capability and shooting range of the AD showed an interesting interplay with one another and increasing both values resulted in higher CME. Threat speed is anticipated to have the lowest effect on CME; however, this could be attributed to other variables in play such as time of simulation and missile type.

Poster Image
Poster
# 27

Youcef
Kardjadja
Poster Number 27

DRL-based server utility maximization in MEC

Youcef Kardjadja, Yacine Ghamri-Doudane, Mohamed Ibnkahla, Internet of Things lab at Carleton University, L3i lab at La Rochelle University, Internet of Things lab at Carleton University

Mobile Edge Computing (MEC) is a popular and promising paradigm that allows service providers to serve their users from nearby servers. Efficiently and effectively managing the scarce edge server resources is a challenging task for service providers, in order to fully leverage the advantages of MEC. As mobile user and IoT node requests arrive and depart dynamically in different locations, choosing the server that processes an arriving request has to be done in real time. Furthermore, the limited server capacity and coverage complicate the decision problem, which aims to maximize the overall edge processed requests while minimizing the cost of hiring edge servers. In this work, we propose a Deep Reinforcement Learning (DRL) cost-effective approach to map the arriving requests to the distributed edge servers. The DRL approach is adapted to the distributed properties of MEC, and trained to maximize the processing utility of edge servers while maintaining a high ratio of edge processed requests. A series of experiments have been conducted to evaluate the performance of the approach, and the results prove its efficiency in reducing the server hiring cost as well as maintaining an acceptable request processing time.

Poster Image
Poster
# 28

Fuguang
Chen
Poster Number 28

The Aging, Payroll Tax Rate and Unemployment Rate Matching Model Based on Search Intensity

Fuguang Chen, Department of Computer Science; Yunhe Zhang, Department of Business Administration, Royce Wang, Department of Business Administration

This project aims to find the transition mechanism from aging to unemployment through the channel of adjustment of tax rate. First, Changes in the age structure of the population place an increasing financial burden on current Pay-As-You-Go pension payments. In response to the increasing inability to make ends meet, the government has adopted relevant measures including taxation and delayed retirement, which objectively alleviated the financing pressure of the pension gap, but also reduced the net income of enterprises and the willingness of enterprises to supply jobs vacancies, which will inevitably have a theoretically impact on the employment rate of the labor force. Second, based on the impact of aging on public finances, especially pensions, the government needs to adopt relevant policy measures such as corporate income tax or payroll tax in order to maintain the balance of pension revenue and expenditure, which will further affect the choices of enterprises and workers in the labor market. In order to verify the correctness of the intuitive assumptions, we consider from the perspective of empirical analysis, using cross-national data, establishing an econometric model, and quantitatively analyzing how aging affects labor market-related variables, especially the unemployment rate. The study shows the new path of how aging affects fiscal revenue and expenditure and taxation, thereby affecting the unemployment rate.

Poster Image
Poster
# 29

Ananthu
Nair
Poster Number 29

Community Graphs on Cybersecurity Signals Model Development

Ananthu Nair, Divya Ashokbhai Maniya, Shaheen Matiur Rehman Shaikh, Ahmed El-Roby, Carleton University, Carleton University, Carleton University, Carleton University

This project aims to use machine learning algorithms to detect anomalies on graph networks, specifically communities of nodes that correspond to users or computers being attacked within an organization. The approach involves collecting data from various sources and applying heuristic anomaly detection algorithms. Unsupervised machine learning algorithms are employed to quantify abnormal behavior in these networks. The methodology involves harmonizing data from different sources, identifying the most accurate anomaly detection algorithm, determining infected clusters in the network, creating a visual representation of the infected community, and providing threat reports in the required format. This project is motivated by the need for an aggregated view of network anomalies to understand the affected areas of infection and to quickly remediate and avoid large-scale network infections. To achieve this, community graph methods are used to analyze anomalous cybersecurity activity from a sample network provided by Interset, the AI, ML, and analytics center of excellence for CyberRes, the security division of Micro Focus. The goal is to identify the most effective algorithm for anomaly detection in large-scale graphs.

Poster Image
Poster
# 30

Nicolas
Laham
Poster Number 30

Tracking the Change in Sentiment and Citations from 1945 to 2023 in the IEEE Literature

Nicolas Laham, Ali Abdelbadie, Zhaoyuan Mei, Olga Baysal, Carleton University, Carleton University, Carleton University, Carleton University

Running large-scale analyses on journal article data has been a growing interest in recent years. Findings from this emerging literature suggest that (1) older publications are being cited more than new publications; (2) ground-breaking (or disruptive) papers are thinning; and (3) sentiment in articles has become more consolidative. This suggests that (a) the scientific culture is changing, and (b) the aims of scientific publication have diverted. We are primarily interested in seeing if this sentiment change is present in the field of technology. Additionally, we are also interested in predicting citation counts with a neural network model. To assess if sentiment has changed in the field of technology, we found it appropriate to use IEEE journals as our data source. The IEEE Xplore API was used to obtain data (in JSON format) on articles from 1945 to 2023. The full dataset—15,600 articles (200 per year)—contains article titles, abstracts, numbers of citations, and numbers of patents (which have been allocated more weight than citations). Only articles with an abstract will be used for analyses. After reducing the weight of high frequency terms using Term Frequency-Inverse Document Frequency (TF-IDF), a text analysis was conducted with a convolutional neural network (CNN). Also, this CNN model was used to predict how likely an article would be cited. Our CNN model yielded a high true positive rate (0.7939) and F1 = 0.7136.

Poster Image
Poster
# 31

Dongrui
Chen
Poster Number 31

Predictive Modelling of Electricity Consumption Using Energy Trading Data

Dongrui Chen , Chenming Liu, Grisha Patel, Majid Komeili, Carleton University, Carleton University, Carleton University, Carleton University

Canada is the largest energy exporter and the second largest energy importer to the United States. Expanding the electricity trade relies heavily on infrastructure construction. Therefore, being able to predict electricity usage is crucial for the energy development of both parties. Our research is devoted to evaluating the level of electricity in Canada, using Tableau to analyze the electricity transaction data between Canada and the United States from 1990 to 2022. This project will also attempt to conduct an analysis of electricity export and import data between the United States and Canada to use multiple linear regression predict the trends in electricity consumption. We assume that the amount of electricity consumption is related to geographical conditions and electricity prices. Based on the results of multiple regression analysis, we found that electricity prices are negatively correlated with electricity consumption, while electricity rates are correlated with geographic location. Additionally, terrain and climate also affect the cost of infrastructure construction and maintenance, with areas that are difficult to access or prone to extreme weather conditions likely incurring higher costs. High electricity prices may curb electricity consumption.

Poster Image
Poster
# 32

Donna
Dsouza
Poster Number 32

Evaluating Type-2 Diabetes Across Canada:​ A Spatial Approach​

Donna Dsouza, Samantha Walker, Shafna Kallil, Dr. Olga Baysal, Carleton University, Carleton University, Carleton University, Carleton University

Type-2 diabetes (T2D) constitutes 90% of all diabetes cases and is one of Canada’s most prevalent chronic diseases. T2D is often preventable, but current public health interventions do not sufficiently address socio-economic and socio-demographic factors associated with the disease. Thus, to determine the factors affecting each local health region in Canada, a spatial-temporal analysis was used to map current and future T2D prevalence. This study was based on data from the Canadian Community Health Survey (CCHS) from 2009-2018. Explanatory data analysis was carried out using R and ArcGIS. Socio-economic and socio-demographic factors such as sex, age, income, marital status, education status, immigration/ethnicity, diet, BMI, activity-level, and smoking level were used in this analysis. A generalized linear regression model and geographically weighted regression model were developed to quantify the impact of socioeconomic and spatial determinants of health on the prevalence of T2D diabetes across Canada. The Moran’s Index for spatial heterogeneity was used to detect local spatial patterns. A predictive model on future prevalence of diabetes was built using machine-learning techniques like k-Nearest Neighbours and Support Vector Machine. These models identified regions of increased T2D prevalence and seek to inform policy makers of regions where preventive intervention may be necessary.

Poster Image
Poster
# 33

Mohammad Bin
Yousuf
Poster Number 33

Mapping the Future of Canadian Cities: Leveraging Data to Predict Housing Needs for Immigrants

Mohammad Bin Yousuf, Iman Mahmoud, Izu Maduekwe, Olga Baysal, M.Sc. Computer Science (Carleton University), M.A International Affairs (Carleton University), M.A Economics (Carleton University), Professor (Carleton University)

This project aims to predict the amount of housing required for the anticipated 1.5 million immigrants in Canada's largest cities in the next three years. To achieve this, the project will analyze housing construction and completion rates, along with other factors such as intended immigrant destination, vacancy rates, and median household income, using machine learning algorithms such as Linear Regression and Gradient Boosting Regression. The government's plan to welcome 1.5 million immigrants in the next three years makes it essential to understand the country's urban housing landscape, especially since most immigrants settle in cities. The project will use data from Statistics Canada's repositories, including Shelter-Cost-to-Income Ratio, Vacancy Rates, Immigrant's Intended Destination, and Housing Construction Reports. However, accessing urban data can be challenging since most information is aggregated by province or country-wide. Additionally, the collection intervals for various datasets vary, with some collecting data quarterly, while others are conducted within census intervals. This project aims to provide a more nuanced understanding of the Canadian urban housing landscape in the next five years. While there is existing research on this topic, this project will approach the issue from an urban data perspective. Preliminary results from the project have been promising, showing us the need to ramp up housing in certain cities. By providing insights into the Canadian urban housing landscape, this project has the potential to contribute to evidence-based policy decisions that can better support immigrant settlement and foster inclusive communities.

Poster Image
Poster
# 34

Huiqian
Chen
Poster Number 34

Machine learning model that classifies Canadian citizens’ financial well-being status and predicts the impact of global shocks

Aziz Al-Najjar, Zakaria Zoundi, Huiqian Chen, Department of Systems and Computer Engineering, Department of Economics, Sprott School of Business

This project utilizes financial well-being data from both pre- and post-COVID-19 surveys conducted by the Financial Consumer Agency of Canada (FCAC) to develop a machine learning model that can predict Canadians' financial well-being status and the potential impact of future global shocks. The project aims to investigate whether the key factors contributing to financial well-being change when ignoring COVID-19, and whether the same set of features can accurately predict financial well-being in both pre- and post-COVID-19 datasets. The results show that financial wellbeing is disproportionately distributed across the nation. The key drivers of financial wellbeing include: household financial situation, capacity to meet monthly expenses, saving, credit score. Besides, the analysis showed that the prediction of financial wellbeing has a higher level of accuracy when global shocks are not accounted for. This showed the significant disruptive effects the pandemic had across sectors and individuals. The project advocates for an increasing effort from the government to help Canadians across provinces to better cope with their finances. Addressing this issue involves creating better jobs, a review of minimum wages, controlling inflation, supporting debt relief including mortgage, and further assistance to households in the lowest income deciles, among others. Particular attention should be given to groups that are usually left-behind such as women and single mothers, persons living with disability and indigenous communities. This project's potential benefits include identifying vulnerable populations and helping decision-makers develop targeted support programs to mitigate negative global shock effects on financial well-being.

Poster Image
Poster
# 35

Nitish
Jain
Poster Number 35

Potential of Purchasing Patterns: A Comprehensive Analysis of Customer Behavior and Privacy Awareness

Nitish Jain, Raha Binte Rashid, Wei Huang. MBA, Masters of computer science, MBA

The project's goal is to leverage online sales consumer purchase data to gather insights that can improve customer experience and increase sales, especially for new start-ups. The study aims to comprehend customer purchasing behavior, identify consumer preferences and advise corporate actions by anticipating which consumers will buy what, segment the customer base, estimate customer flux, and assess customers' understanding of how vendors store and use their PII. The initiative is crucial because it may assist businesses in better understanding their target market and developing effective marketing and product development strategies. The study's possible benefits include improved customer experience, more revenue, better business strategies, and a better knowledge of customer behavior. The study's ethical issues include data privacy and the ethical use of consumer data. To achieve the project goals, we perform two types of data analysis; 1. Using K-Means Clustering to perform Customer Segmentation and 2. Experimenting with three Machine Learning models (Logistic Regression, Random Forrest, and Decision tree) to predict trends in Customer Behavior. Additionally, we aim to make an interactive visualization dashboard using Power BI to detect patterns. We find that training an efficient model requires a rich dataset, and as such expect some of our models to overfit. Similar datasets will be used to further enhance our models, which are expected to help new start-ups perform predictions with limited data. Lastly, to understand the concept of data privacy amongst consumers we contacted about 21 random consumers via email and aim to conduct surveys on the interested ones.

Poster Image
Poster
# 36

Ali
Farhat
Poster Number 36

Trust-Management Module for IoT Systems: An Interaction-based Machine Learning Approach

Ali Farhat, Professor Mohamed Ibn Kahla, Professor Ashraf Matrawy, Dr. Abdelrahman Eldosouky. IoT Lab - Department of Systems and Computer Engineering - Carleton University, IoT Lab - Department of Systems and Computer Engineering - Carleton University, Carleton School of Information Technology - Carleton University, IoT Lab - Department of Systems and Computer Engineering - Carleton University

The Internet of Things (IoT) has swiftly become one of the most popular technological innovations in recent years and is being adopted in vast application domains. However, the deployment of IoT-based solutions introduced several challenges related to security and privacy. Traditional security solutions are not suitable for IoT systems due to dynamicity of heterogenous IoT systems, and stringent resource constraints. Subsequently, trust-based solutions have been introduced to track the dynamic behavior in IoT systems through analyzing the interactions in the system. There have been several proposed IoT trust modules in the literature implemented on the device level, requiring modification of devices' proprietary software. This could potentially compromise devices certification. Moreover, this increases network overhead and devices' resource consumption for trust computation. Thus, this work shifts the implementation of the trust module to a higher layer, the IoT access layer. Therefore, the proposed module establishes trust based on the device-system interactions without requiring any additional information. It utilizes the communication and security attributes to compute the trust value of an IoT device and analyze its behavior. Afterwards, the proposed module utilizes a neural network to determine if the device is acting maliciously or having a low trust value due to performance degradation. Finally, the proposed module analyzes the security related attributes of IoT devices to identify the unsatisfactory device security status.

Poster Image
Poster
# 37

Kanksha
Patel
Poster Number 37

Gender Pay Gap: Predictive Modelling and Intersectional Analysis for Data-Driven Solutions

Mikayla Koronkiewicz, Rachel Ernest-Cohen, Kanksha Patel, Master of Public Policy and Administration, Master of Arts in International Affairs, Master of Engineering in Electrical and Computer Engineering

The gender pay gap continues to pose a significant global development challenge, hindering economic growth and exacerbating poverty in communities worldwide. To this end, our project involved the development a predictive model to assess the gender pay gap in a series of countries using a range of sociocultural and economic input variables. Introduction: Our model relies on a variety of indicators including, but not limited to: GDP per capita, female labour force participation rate, sociocultural indicators, school enrollment, and unemployment. The study has significant implications, as gender equality in the workforce has been identified as a key factor for addressing extreme poverty and increasing the economic output and well-being of communities. As such, our predictive model can be used by policymakers and organizations to design targeted interventions to address the most salient contributing factors and reduce the wage gap. Methodology: Python programming was used to complete the data cleaning and analysis, while Tableau was used for visualization. The Pearson's correlation coefficient will determine the strength and direction of the relationship between variables. A prediction model will be developed using three machine learning models and the R2 score as a regression metric. The model will focus on predicting future annual earnings and calculating the gender pay gap ratio. Conclusion: This project has several additional applications, including identifying and quantifying the current extent of gender wage gap in countries not included in our sample, measuring the effectiveness of policy initiatives that employ our model, and promoting equity in employment opportunities and wages

Poster Image
Poster
# 38

Zied
Bouida
Poster Number 38

Carleton-Cisco IoT Testbed: A powerful platform for machine learning

Zied Bouida, Dana Haj Hussein, Ismael AlShiab, Yousef Rafique, Abdallah Jarwan, Sajib Kumar Kuri, and Mohamed Ibnkahla

The Carleton-Cisco Internet of Things (IoT) testbed is being built at Carleton University's IoT Lab. Hosting diverse applications, the testbed adopts a layered approach composed of four layers: Sensing, Edge, Fog, and Cloud layers. The IoT testbed is distributed over multiple sites at Carleton University's Campus and the City of Ottawa. In this poster, we first present the architecture and the features of the testbed. Then, we give an overview of the applications running on the testbed and their underlying techniques and protocols. In particular, we highlight the use of Edge computing and data analytics in selected applications deployed on the testbed.