Problem Description:
In JMP assignment, we are tasked with analyzing a dataset related to blood pressure in the mythical town of Angina. The dataset contains information about 32 subjects, with four key variables: systolic blood pressure (SBP), the Quetelet index (QUET, a measure of body mass), chronological age (AGE), and a smoking status indicator (SMK). We are required to estimate six different models (A, B, C, D, E, and F) to explore the relationships between these variables and systolic blood pressure. Additionally, we need to assess the validity of these models and compare their statistical significance.
Solution:
- Correlation and Relationship Analysis First, let's look at the correlation matrix for the key variables:
SBP | QUET | AGE | SMK | |
SBP | 1.000 | 0.742 | 0.775 | 0.247 |
QUET | 0.742 | 1.000 | 0.803 | -0.071 |
AGE | 0.775 | 0.803 | 1.000 | -0.139 |
SMK | 0.247 | -0.071 | -0.139 | 1.000 |
From this, we can conclude that both QUET and AGE are strongly correlated with SBP, making them good predictors for SBP. SMK, being a categorical variable, requires further testing. However, it's important to note that AGE and QUET are highly correlated, potentially leading to multicollinearity, which may impact the significance of one of these variables.
- Validity of Model C Model C is considered good as the F-test is statistically significant, indicating that the model can explain approximately 76.1% of the variation in SBP.
- Comparison of F-values The F-values for the different models are as follows:
Model A: 45.18
Model B: 39.16
Model C: 29.71
Model F: 29.71
All models are statistically significant, and the F-values decrease because the predictive power of subsequent variables diminishes.
- Effect of Adding QUET to Model B The partial coefficient of determination is 0.031, indicating that adding QUET explains an additional 3.1% of the variation in SBP. The partial coefficient of correlation is 0.018, signifying a slight increase in correlation between the fitted and actual values when QUET is included.
- Testing β3 in Model C Using a t-test, the result for H0: β3 = 0 is a t-statistic of 1.910, with a p-value of 0.067. The partial F-statistic, comparing Model B and Model C, also yields a p-value of 0.067. Model C is not a significant improvement over Model B, as both tests are statistically insignificant.
- Assessing the Contribution of SMK in Model F For H0: β3 = 0 in Model F, we obtain a t-statistic of 3.744 with a p-value of 0.001, indicating statistical significance. The partial F-statistic comparing Model F with Model E also yields a significant p-value of 0.001.
- Optimal Combination of Predictor Variables The most optimal combination of predictor variables is Age and SMK. Both are statistically significant in the full model and jointly explain around 73% of the variation in SBP. The addition of QUET does not significantly improve the model, adding complexity without clear benefits. Therefore, using Age and SMK variables exclusively is a better choice.