Faculty Mentor(s)
Dr. Robert Henry, Psychology; Dr. Alyssa Cheadle, Psychology
Document Type
Poster
Event Date
4-12-2024
Abstract
A growing practice within the psychological sciences is a reliance on machine learning models for data analysis. Machine learning models may have profound implications for improving human well-being and health (e.g., Walsh et al., 2017). However, like any statistical model, machine learning requires that the assumptions of linear regression are met, though the implications of violating these assumptions in machine learning models are unknown. Our study investigates these potentially stark consequences by testing machine learning models in simulated data with built-in assumption violations, as well as a publicly available dataset. We hypothesize that when regression assumptions are violated, we will fail to replicate findings in public and simulated datasets, thus increasing risk for Type I/II errors. After conducting a search for a large (N > 2,000) public dataset with a robust literature, we selected the Wisconsin Longitudinal Study (WLS) as our public dataset of interest. We replicated findings from Clark and Lee (2021), who studied how both early- and later-life variables correlate with later-life subjective well-being. We replicated their published findings and used three common supervised learning models: regularized regression, support vector machine, and random forest. Understanding the consequences of assumption violations in machine learning can enhance the replicability of these models. If linear assumptions are violated, we expect the risk of false positive/negative outcomes to increase. We envision that these discoveries will offer valuable guidance to psychological researchers employing machine learning techniques.
Recommended Citation
Repository citation: Anderson, Grace Mooney and Brewer, Melia, "Examining Regression Assumption Violations in Machine Learning Models Using the Wisconsin Longitudinal Study Dataset" (2024). 23rd Annual A. Paul and Carol C. Schaap Celebration of Undergraduate Research and Creative Activity (2024). Paper 3.
https://digitalcommons.hope.edu/curca_23/3
April 12, 2024. Copyright © 2024 Hope College, Holland, Michigan.
Comments
This study relies, in part, on data from the Wisconsin Longitudinal Study (WLS). The WLS has been supported by the NIA of the National Institutes of Health under award numbers AG-9775, AG-21079, AG-033285, and AG-041868. This research is also supported by the Spencer Foundation, the Vilas Estate Trust, the Graduate School of the University of Wisconsin- Madison and by the National Science Foundation.