In the R Programming assignment, we delve into a dataset containing 60 observations and 6 variables gathered from the LWM-LMP program. Among these variables are three continuous ones: BMI, Systolic blood pressure, and Age, as well as three categorical variables: time spent in Australia, socioeconomic status, and the presence of hypertension. Our analysis will primarily emphasize the application of statistical techniques and the utilization of R programming to address key research questions.
Part 1: Investigating BMI Across Time in Australia
Research Question: Is there a significant difference in the mean BMI among participants who spent varying durations in Australia, classified into five categories: >20 years, 1-5 years, 6-10 years, 11-20 years, and less than 12 months?
We initiated our analysis with the calculation of descriptive statistics for BMI across the different time categories. The boxplot and a table of these statistics are presented below.
Table 1: Descriptive Statistics for BMI by Time
Figure 1: Boxplot of BMI against Time
After observing the descriptive statistics and the boxplot, we noticed that there is no significant difference in BMI across different durations of time spent in Australia. However, two outlier points were identified - one in the '>20 years' category and another in the '<12 months' category.
Given these observations, we conducted an ANOVA test to assess whether there is a statistically significant difference in the average BMI across the different time categories. The null and alternative hypotheses were defined as follows:
- H0: μ1 = μ2 = μ3 = μ4 = μ5 (No significant difference)
- H1: At least one mean is different
The significance level was set at 5%. The ANOVA results are summarized in the table below.
|Source of Variation||DF||Sum Sq||Mean SQ||F-Value||P-value|
Table 2: Analysis of Variance for BMI by Time
The ANOVA test yielded a statistically significant result (F(4,55) = 2.45, p = 0.057). However, based on the significance level, it was concluded that the main effect of time is not statistically significant.
In conclusion, there was no significant difference in BMI among participants based on the duration they spent in Australia.
Part 2: Examining the Relationship between Age and Systolic Blood Pressure
Research Question: Is there a linear relationship between systolic blood pressure and age, and is this relationship statistically significant?
To address this question, we began by creating a scatterplot to visualize the potential linear relationship between age and systolic blood pressure.
Figure 2: Scatterplot of Age and Systolic Blood Pressure
The scatterplot suggested a weak linear relationship between age and systolic blood pressure. We proceeded to conduct a linear regression with systolic blood pressure as the dependent variable and age as the predictor.
The null and alternative hypotheses for the regression were defined as follows:
- H0: βage = 0 (No significant relationship)
- H1: βage ≠ 0
The significance level was set at 5%. The results of the linear regression analysis are displayed in the table below.
Table 3: Linear Regression for Systolic Blood Pressure by Age
The linear regression model explained a statistically significant but weak proportion of variance (R2 = 0.10). The effect of age was statistically significant and positive (beta = 0.33, t(58) = 2.48, p = 0.016).
Based on the regression analysis, there was a weak, positive, and statistically significant relationship between age and systolic blood pressure.
Part 3: Exploring Correlations Between Time in Australia and Hypertension
Research Question: Do the data provide evidence to suggest that the length of time spent in Australia (Time) is correlated with whether participants have hypertension?
To address this research question, we used a chi-squared test of independence. Initially, we created a stacked bar chart to visualize the proportions of hypertension for each time category.
Figure 3: Stacked Bar Chart of Hypertension by Time
Based on the bar chart, there appeared to be some association between hypertension and time. The null and alternative hypotheses for the chi-squared test were defined as follows:
- H0: There is no association between time and hypertension
- H1: H0 is false
The significance level was set at 5%. The results of the chi-squared test are presented in the table below.
Table 4: Chi-squared Test of Independence for Time and Hypertension
The chi-squared test yielded a statistically insignificant result (χ2 = 9.242, p = 0.055). The test indicated that the strength of association between time and hypertension was not strong enough to be considered significant.
Our analysis concluded that there was no significant correlation between hypertension and the time participants spent in Australia.
This assignment demonstrates the application of statistical methods and R programming skills to analyze data related to well-being. It showcases the use of descriptive statistics, ANOVA, linear regression, and chi-squared tests to address pertinent research questions related to BMI, systolic blood pressure, age, and their relationships with other variables.