# Hypothesis testing

To most students, hypothesis testing is a common problem, and the major challenge is on the analysis, not conducting the test. For this reason, our focus will be on the interpretation of the already analyzed statistical hypothesis test. Take your time to look at the interpretation and conclusions that we provide here. So let’s start with the model that we shall be using in this analysis.

## Fitting a regression model

Consider the following regression model:
Lung¬_i=α+ β×CIG
We expect that with an increase in Cigarettes, the probability of lung cancer should increase as believed popularly. Mathematically, we want to test if β>0 for the above regression model. This study is, of course, important to see how critical the effect of cigarettes on lung cancer is.
Let’s define our hypothesis formally: -
H_0: Smoking doesn^' thave any effect on the likelihood of lung cancer
vs
H_a: H_0 is false and Smoking have a significant effect on lung cancer

### Data used in the analysis

For this statistical analysis, our SPSS expert selected the data on the number of people who have lung cancer and the number of people who smoke cigarettes. The dataset is available here. The dataset has one categorical variable, i.e., state, and two numeric variables, CIG and LUNG. We have, in total, 44 data points. We’ll have LUNG as a variable of interest and CIG as an explanatory variable. Figure 1: Pictorial representation of data

### Descriptive Statistics

Before going into full analysis of the data, it’s good to see how the data is distributed—the summary of the data.
First, let’s have a look at the boxplot of the two quantitative variables. Here we see that there are few outliers in the CIG variable on the top side. No such outlier case in the LUNG variable.
Now let’s have a look at the scatter plot of the two quantitative variables. The plot has a linear regression line fitted already. By the looks of it, we see that datapoints are not perfectly linear but close to it, and hence linear regression study should make sense.
We’ll also look at the normal Q-Q plot of variables to see if they are normally distributed.  Here’s the normal Q-Q plot for the two variables. The plots look reasonable straight, and the conclusion made by our expert was that the two variables are normally distributed.

### Correlation analysis

Here our experts performed the regression analysis to see if there’s really any significant relationship between CIG and LUNG. The model summary is attached below: -
Model Summaryb

 the model R R R2 Adjusted R2 Std. error -Estimate .697a .486 .474 3.06607
The R^2of the model is 0.486, which is statistically significant at any reasonable level of significance. The p-value is ~0.
ANOVAa

 Model Sum of Squares df Mean Square F Sig. Regression Residual Total 373.878394.833 768.712 1 4243 373.878 9.401 39.771

Coefficientsa

Model Unstandardized Coefficients Standardized Coefficients t Sig.
B Standard errorError Beta(β)
1 (Const) 6.472 2.141 3.023 .004
CIG .529 .084 .697 6.306 .000
The plot of residuals against the predicted is plotted. We see that there’s no visible pattern, and the quantities look uncorrelated.
The regression equation is:
Lung=6.472+0.529×CIG So, we have the following hypothesis.
H_0:β=0 vs H_1:β≠0
P-value ~0, which suggestsrejectingour H_0. We test the hypothesis against the data and conclude that we reject our NULL hypothesis in favor of alternatives since there’s significant evidence of the same in the data.

### Conclusion

After doing our regression study, our online SPSS tutors concluded that there’s significant evidence of CIG having the effect of LUNG. The coefficient is +ve, and the p-value of the test is very small, indicating the strong relationship between the two.
Of course, correlation is never a proof of causation, and we do not have individual-wise data to support the stronger claim (since different individuals might have lung cancer than those who smoke). So, the next step for research can be to have a longitudinal dataset of individuals and observe them for a longer period to have a better conclusion.