# Using SPSS to Perform Regression Testing on Factors Affecting Student Exam Scores

In this comprehensive data analysis assignment, we explore two critical domains—academic performance and health outcomes. The first assignment focuses on academic success, dissecting the factors influencing students' exam scores, such as GPA and teaching methods. The results reveal the significant impact of GPA while shedding light on the limited influence of teaching methods on exam scores.

## Problem Description:

In SPSS assignment, we examined the factors affecting students' exam scores. We considered two key factors: GPA scores (fixed) and the condition of active learning (random), which can either be having or not having active learning. To analyze these factors, we employed a Linear Mixed Model and fitted it with the Restricted Maximum Likelihood (REML) method. The aim was to determine the influence of GPA and the teaching method on exam scores.

## Solution

Mathematical Model & Parameters: To predict students' exam scores, we used the following mathematical model:

• y_ij = Exam score of the ith student (i = 1, …… 324) in the jth instructor (j = 1, …. 10)
• μ = Fixed intercept
• β_1 = Effect on condition
• x_ij2 = GPA score of ith student in the jth condition
• β_2 = Effect on GPA score (slope between intake and exam score)
• d_j = Random effect of jth instructor; d_j ~ N(0, σ²d)
• e_ij = Residual/error of ith student in jth instructor e_ij ~ N(0, σ²e)

Linear Mixed Method - GPA and Teaching Method as Predictors: We applied a Linear Mixed Model to assess the impact of GPA as covariates and the teaching method or condition of active learning as a random factor on exam scores. The model's goodness of fit was assessed using Akaike information criterion (AIC) and Schwarz's Bayesian Criterion (BIC).

## Results:

• GPA significantly affects exam scores (F (1, 277.509) = 724.365, p < 0.001).
• The condition of active learning does not significantly affect exam scores (F (1, 8.007) = 1.3589, p = 0.272).

Prediction of Examination Score: Using the mathematical model, we predicted exam scores based on GPA and active learning condition. For example, if a student had a GPA of 3.0 and was in an active learning condition, their exam score was estimated to be 71.823 or 72.

Correlation of Exam Score Between Students Under the Same Instructor: The correlation of exam scores between students under the same instructor was computed as 0.49, indicating a moderate correlation.

Comparing Models: The Schwarz's Bayesian Criterion (BIC) showed that the model using GPA and teaching method as predictors (Model 1) was the better fit with a lower BIC value (1766.509) compared to the model with only the teaching method (Model 2, BIC = 1925.255).

## Question 2: Analyzing Blood Cholesterol Levels

Problem Description: In this part of the assignment, we employed binary logistic regression to analyze factors that predict whether a person meets the blood cholesterol level goal (METGOAL). The variables examined include ADHERENCE, SMOKE, and MONTHS.

Analysis - Binary Logistic Regression: Since the dependent variable is binary (met the goal or not), binary logistic regression was the appropriate method.

## Results:

• Chi-square is significant at 5% (χ² (3) = 165.272, p < 0.001).
• Nagelkerke R-square indicates that 18.7% of the variation in the probability of meeting the blood cholesterol level goal is explained by ADHERENCE, SMOKE, and MONTHS.
• Hosmer and Lemeshow Test shows that the model fits well with the data.

Significant Predictors: The independent variables ADHERENCE, SMOKE, and MONTHS are significant predictors of whether a person meets the blood cholesterol level goal.

Formula for Probability of Meeting Cholesterol Goal: We derived a formula to calculate the probability of a patient reaching their blood cholesterol goal based on the given variables.

Predicting Probability of Meeting Cholesterol Goal: Using the formula, we calculated the probability of a patient meeting their cholesterol goal under specific conditions.

Percentage of Correct Predictions: The classification table showed that 64.5% of predictions were correct, with high accuracy for both categories. This indicates the validity of the binary logistic regression model's findings.