23rd Annual A. Paul and Carol C. Schaap Celebration of Undergraduate Research and Creative Activity (2024)

Examining Regression Assumption Violations in Machine Learning Models Using the Wisconsin Longitudinal Study Dataset

Grace Mooney Anderson, Hope College
Melia Brewer, Hope College

Faculty Mentor(s)

Dr. Robert Henry, Psychology; Dr. Alyssa Cheadle, Psychology

Document Type

Poster

Event Date

4-12-2024

Abstract

A growing practice within the psychological sciences is a reliance on machine learning models for data analysis. Machine learning models may have profound implications for improving human well-being and health (e.g., Walsh et al., 2017). However, like any statistical model, machine learning requires that the assumptions of linear regression are met, though the implications of violating these assumptions in machine learning models are unknown. Our study investigates these potentially stark consequences by testing machine learning models in simulated data with built-in assumption violations, as well as a publicly available dataset. We hypothesize that when regression assumptions are violated, we will fail to replicate findings in public and simulated datasets, thus increasing risk for Type I/II errors. After conducting a search for a large (N > 2,000) public dataset with a robust literature, we selected the Wisconsin Longitudinal Study (WLS) as our public dataset of interest. We replicated findings from Clark and Lee (2021), who studied how both early- and later-life variables correlate with later-life subjective well-being. We replicated their published findings and used three common supervised learning models: regularized regression, support vector machine, and random forest. Understanding the consequences of assumption violations in machine learning can enhance the replicability of these models. If linear assumptions are violated, we expect the risk of false positive/negative outcomes to increase. We envision that these discoveries will offer valuable guidance to psychological researchers employing machine learning techniques.

Recommended Citation

Repository citation: Anderson, Grace Mooney and Brewer, Melia, "Examining Regression Assumption Violations in Machine Learning Models Using the Wisconsin Longitudinal Study Dataset" (2024). 23rd Annual A. Paul and Carol C. Schaap Celebration of Undergraduate Research and Creative Activity (2024). Paper 3.
https://digitalcommons.hope.edu/curca_23/3

Comments

This study relies, in part, on data from the Wisconsin Longitudinal Study (WLS). The WLS has been supported by the NIA of the National Institutes of Health under award numbers AG-9775, AG-21079, AG-033285, and AG-041868. This research is also supported by the Spencer Foundation, the Vilas Estate Trust, the Graduate School of the University of Wisconsin- Madison and by the National Science Foundation.

Download

Included in

Psychology Commons

COinS

23rd Annual A. Paul and Carol C. Schaap Celebration of Undergraduate Research and Creative Activity (2024)

Examining Regression Assumption Violations in Machine Learning Models Using the Wisconsin Longitudinal Study Dataset

Faculty Mentor(s)

Document Type

Event Date

Abstract

Recommended Citation

Comments

Included in

Search

Browse

Author Corner

Links

Links

23rd Annual A. Paul and Carol C. Schaap Celebration of Undergraduate Research and Creative Activity (2024)

Examining Regression Assumption Violations in Machine Learning Models Using the Wisconsin Longitudinal Study Dataset

Student Author(s)

Faculty Mentor(s)

Document Type

Event Date

Abstract

Recommended Citation

Comments

Included in

Share

Search

Browse

Author Corner

Links

Links