Aug 2021 - May 2022

Undergraduate Data Science Researcher at The Data Mine - Purdue University

" Provide the Purdue community with recreational and wellness activities that contribute to learning and the pursuit of an active, healthy lifestyle. "

2022 Data Mine Corporate Partners Symposium

Participated in the 2022 Data Mine Corporate Partners Symposium, where undergraduate students showcased their research findings. During this event, we presented out project, discussed our objectives, the methodologies employed, and the results obtained from our year-long research collaboration with Purdue Student Life & Purdue RecWell.

Data
Management

Utilized the R programming language to efficiently manage and analyze large datasets. Combined multiple datasets spanning across different years and integrated data from both Purdue Student Life and Purdue RecWell. The data was then sorted into various groups through classifications, leveraging R's capabilities for handling extensive datasets.

PCA &
K Means

Responsible for utilizing Principal Component Analysis (PCA) to transform high-dimensional datasets into lower-dimensional representations. This process aims to preserve as much variance as possible while minimizing information loss. The K-means clustering alogirthm was applied to the reduced-dimensional data, effectively grouping individual data points into distinct clusters. Two-dimensional and three-dimensional visualizations of these clusters are generate to provide clear insights into the underlying patterns and relationships within the dataset.

Cluster
Analysis

After creating clusters through PCA and K-means, patterns and relationships were analyzed between samples within each cluster to understand their associations. Tableau was then utilized to create bar graphs that visualized the differences between clusters, examining metrics such as mean term GPA and z-scores of standarized testing. Outliers within clusters were investigated to make educated assessments to identify potential reasons for their occurance.