# Statistical Analysis and Interpretation of GSS18 Dataset: Exploring Relationships between Variable Sets

In the comprehensive analysis of the GSS18 dataset, this study delves into the intriguing interplay between various sets of variables. From examining the connection between one's mother's religion during childhood and their current religious preferences to exploring the relationship between spending evenings at a bar and age, this research employs a range of statistical tests. These tests include Chi-Square, ANOVA, t-test, and correlation analysis to uncover insightful patterns and associations. The findings shed light on factors affecting mental health, work patterns, and even the intriguing correlation between height and weight, offering valuable insights into the dataset's underlying trends and relationships.

## Problem Description:

The data analysis assignment involves conducting statistical analyses using the GSS18 dataset to explore relationships between different sets of variables. Students are required to determine the appropriate hypothesis test (Chi-Square, t-test, ANOVA, or Correlation) for each set of variables and provide detailed interpretations of the statistical outputs. Below are the solutions for the assignment:

### A) Mother's Religion and Religious Preference

Variables:

• #478 – MARELKID - Nominal
• #768 – RELIG (R’s religion preference) - Nominal

Chi-Square Tests

Value df Asymp.Sig. (2-sided)
Pearson Chi-Square 2706.759a 110 0.000
Likelihood Ratio 932.54 7 110 0.000
Linear-by-Linear Association 150.306 1 0.000
N of Valid Cases 1131
a. 115 cells (87.1%) have expected count less than 5. The minimum expected c ount is .00.
Solution: For these variables, a chi-square test was performed.

The p-value is < 0.05, indicating that the alternative hypothesis (H1) is true. This implies a statistical relationship between someone's mother's religion when they were a child and their religious preference. The association is moderate, as indicated by Cramer's V (0.489) and lambda (0.429). Further analysis of crosstabulations is required to understand the precise relationship.

### B) Spending Time at a Bar and Age

Variables:

• #869 – SOCBAR (spend evening at bar) - Ordinal (7 groups)
• #28 – AGE (respondent's age) - Ratio

Solution: For these variables, an ANOVA test was conducted.

ANOVA
Age of respondent

Sum of Squares df Mean Square F Sig.
Between Groups 34702.745 6 5783.791 19.319 0.000
Within Groups 461939.962 1543 299.378
Total 496642 .707 1549

The p-value is < 0.05, confirming that the alternative hypothesis (H1) is true, suggesting that at least one group's mean is different. Specifically, respondents who never spend evenings at a bar are 5 to 17 years younger than those who spend evenings at a bar several times a week.

### C) Poor Mental Health and Extra Work Hours

Variables:

• #524 – MNTLHLTH (days of poor mental health in the past 30 days) - Ratio
• #527 – MOREDAYS (days per month R worked extra hours) - Ratio

Solution: A correlation analysis was performed for these variables.

Correlation
Days of poor mental health past 30 days Days per month R work extra hours
Days of poor mental Pearson Correlation 1 0.026
health past 30 days Sig. (2-tailed) 0.327
N 1408 1391
Days per month R work Pearson Correlation 0.026 1
extra hours Sig. (2-tailed) 0.327
N 1391 1401

The p-value is > 0.05, supporting the null hypothesis (H0), indicating no significant connection between the number of days of poor mental health and the number of days worked with extra hours in a month.

### D) Weeks Worked Last Year and Depression

Variables:

• #983 – WEEKSWRK (weeks R worked last year) - Ratio
• #177 –told have depression - Nominal

Independent Sample Test
Levene's Test Variances for Equality of t-test for Equality of Means
Mean Std. Error 95% Confidence Difference Interval of the
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
Weeks R worked last Equal variances year assumed 18.718 .000 -2.683 1389 .007 -2.106 785 -3.645 -566
Equal variances not assumed -2.362 353.816 .019 -2.106 .891 -3.859 -353

Solution:

A significant difference was found through a t-test, with a p-value < 0.05, confirming that the average number of weeks worked in the past year differs between individuals diagnosed with depression and those who are not. The analysis shows a 95% confidence level for the average number of weeks worked, with those without depression working 0.4 to 4 weeks more than those diagnosed with depression.

### E) Height and Weight

Variables:

• #305 – HEIGHT - Ratio
• #984 – WEIGHT - Ratio

Solution: Correlation analysis was used for these ratio variables.
Correlation

R weighs how much R is how ta.l.l
R weighs how much Pearson Correlation 1 0.457
Sig.(2-tailed) 0.000
N 138.0. 1374
R is howtall Pearson Correlation 0.457 1
Sig.(2-tailed) 0.000
N 1374 1402

The p-value is < 0.05, indicating that there is a moderate positive association between a person's height and weight (R = 0.457). About 20.88% of the change in weight can be attributed to changes in height, while 79.12% can be attributed to other factors.

### F) Mandatory Overtime and Real Income

Variables:

• #532 – MUSTWORK (mandatory to work extra hours) - Nominal
• #722 – REALRINC (R’s income in constant dollars) - Ratio

Solution: For these variables, a t-test was conducted.
Independent Sample Test

Levene's Test Variances for Equality of t-test for Equality of Means
Mean Std. Error 95% Confidence Difference Interval of the
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
R's income in constant \$ Equal variances assumed .038 .844 1.174 1192 .241 2127.793 1812.996 -1429.225 5684.811
Equal variances not assumed 1.180 621.339 239 2127.793 1803.898 -1414.682 5670.268

The p-value is > 0.05, supporting the null hypothesis (H0), suggesting that the mean income in constant dollars is equal for those required to work overtime and those who are not. Further analysis is not required as we have accepted the null hypothesis.