# Multiple regression

## Multiple regression

If we have three categories, C1, C2, and C3, with three observations each and assume value (1, 2, 3) for Y The design matrices are given by

### Test of independence and t-test

H0:The type of fish consumption ("fishpart") is independent of "fisherman" classification H1:The type of fish consumption ("fishpart") is not independent of "fisherman" classification Rejection rule: Reject the H0, if the p-value <= 0.05 Chi-square test data: data\$fishpart and data\$fisherman Xsquared = 35.033, df = 3, pvalue = 1.199e-07 The test result have p-value < 0.05, therefore, the expertwill have to reject the H0 and state that the type of fish consumption ("fishpart") is not independent of "fisherman" classification In most R assignment help questions that we have tackled, the t-test is used to check for the equality of the mean. For this case, to check if the mean of total mercury is equal in the fisherman and non-fisherman population, we run a t-test with the following hypothesis H0:The mean of total mercury is equal in the fisherman and non-fisherman population H1:The mean of total mercury is not equal in the fisherman and non-fisherman population Welch Two Sample ttest data: data\$TotHg by data\$fisherman t = -3.9101, degrees of freedom = 126.81, pvalue = 0.0001495 alternative hypothesis is correct; difference in means ≠ 0 95% cf: -2.3557752 -0.7725705 estimates of the sample: mean of group 0 mean of group 1 2.616657 4.180830 The test result has p-value < 0.05, we reject the H0 and therefore make a conclusion that the mean of total mercury is not equal in the fisherman and non-fisherman population

### Interpretation of the multiple regression

This section is divided into two parts, as there were two regression models that were fitted on data.

### Regression model one output

Call: lm(formula = TotHg ~ fishmlwk + weight + fishpart, data = data) Resids: Minimum 1Q Median 3Q Maximum -4.8689 -1.3979 -0.3596 0.6373 11.3905 Coefs: Estimated Standard Errors t value Pr(>|t|) (Intercepts) -10.72347 2.52476 -4.247 4.07e-05 *** fishmlwk 0.15049 0.04253 3.539 0.000557 *** weight 0.17705 0.03327 5.322 4.32e-07 *** fishpart 0.33003 0.32257 1.023 0.308129 Resid std errors: 2.564 on 131 df Multiple Rsquard: 0.2557, Adjusted Rsquard: 0.2387 F-stat: 15 on 3 and 131 DF, pvalue: 1.882e-08 After conducting the online multiple regressiontutor had fitted the model on a sample,the regression output tells us that: For every unit increase in the Fish meals per week, the Total Mercury increase by 0.15 For every unit increase in the weight, the Total Mercury increase by 0.18 The Total Mercury increase by 0.33 when muscle tissue only is used compare to none fish part is used The scatter plot for the Y variable and the X independent variable is given below

Diagnostic plots for the model

The above Normal q-q plot isn't satisfactory. Most of the points fall out of the straight line produced by the normal q-q plot, and also, there are present of outliers in the data.

The R homework expert concludes there is the presence of outliers as revealed by the Q-Q plot. We should remove the outliers' data points (7, 85, and 93) from the data.

### Regression model two

The output of the regression model is given by Call: lm(formula = log_TotHg ~ fishmlwk + weight + fishpart, data = data) Resids: Minimum 1Q Median 3Q Maximum -4.6626 -0.2956 0.0659 0.3968 1.9733 Coefs: Estimate Standard Errors t values Pr(>|t|) (Intercepts) -3.10544 0.78155 -3.973 0.000116 *** fishmlwk 0.01583 0.01316 1.203 0.231282 weight 0.04981 0.01030 4.837 3.64e-06 *** fishpart 0.22964 0.09985 2.300 0.023040 * Resid std. errors: 0.7937 on 131 df Multiple Rsquard: 0.1915, Adjusted Rsquard: 0.173 Fstats: 10.34 on 3 and 131 DF, pvalue: 3.726e-06 The regression output tells us that: For every unit increase in the Fish meals per week, the Total Mercury increase by 0.16 For every unit increase in the weight, the Total Mercury increase by 0.050 The Total Mercury increase by 0.23 when muscle tissue only is used compare to none fish part is used The scatter plots are given below

### Diagnostic Plot

This diagnostic plot shows an acceptable residual structure now. Neither unequal variability nor major curvature is present.

The Normal q-q plot is satisfactory. Most of the points on the chart fall on the straight line produced by the normal q-q plot. It seems that the assumption of the normal distribution for the error terms is a reasonable one, although there are still some outliers in the data. Yes, there is the presence of outliers as revealed by the Q-Q plot. We should remove the outliers' data points (13, 46, and 93) from the data. The first model in question 4 hasR2 as0.2557, while the second model has the R2 as 0.1915. The model with the high R2 is said to be better. Model1 performs better than Model2.