## Assistance with Data Analysis Assignment involving a Model for Prediction Purpose

In order to get a good model with high predictive power that will we be of great help in solving this data analysis assignment, I decided to start with a full model including all the variables, then using a backward selection process to delete a non-significant variable (when p-value greater than 0.1) from the model. The initial model is

Since markdown 1-5 must be in the final model, I will not consider them for removal. My result shows that only the medium city is not significant (p=0.135). Therefore, I will remove it from the model.

Re-estimating the model, I see that city size (large city) becomes insignificant to have the largest p-value (p=0.103); therefore, I removed it from the model. After re-estimation, we see that temperature is the next insignificant variable with the largest p-value (p=0.1088); thus, I removed it and re-estimate the model. After re-estimation, we see that inspection is the next insignificant variable with the largest p-value (p=0.1088). Thus, I removed it and re-estimate the model

The result shows that, apart from the markdown variables, all other variables are statistically significant. The estimated model is thus;

The result is summarized in the regression table below. The result shows that fuel prices have a significant positive effect on weekly sales. A dollar increases in fuel price increases weekly sales by $4323.97 (p<0.001). Markdown 1 has a significant negative effect on weekly sales. A dollar increase in markdown1 reduces sales by $0.25 (p=0.0069). Markdown 2 has a positive but insignificant effect on weekly sales, a dollar increase in markdown2 increases sales by $0.0046 (p=0.916). Markdown 3 has a negative but insignificant effect on weekly sales. A dollar increase in markdown3 reduces sales by $0.03 (p=0.516). Markdown 4 has a positive but insignificant effect on weekly sales, a dollar increase in markdown4 increases sales by $0.04 (p=0.761). Markdown 5 has a significant negative effect on weekly sales. A dollar increase in markdown5 reduces sales by $0.72 (p<0.001). CPI has a positive and significant effect on weekly sales. A unit increase in CPI increases sales by $35.03 (p=0.013). Unemployment has a positive and significant effect on weekly sales. A percent increase in Unemployment increases sales by $4416.71 (p<0.001). Store size has a positive and significant effect on weekly sales. A square feet increase in-store size increases sales by $0.28 (p=0.038).

The r-squared shows that the independent variables in this model explain the 17.04% of variation found in the model. The fisher test shows that the overall model is statistically significant; F(10,127)=14.67, p<0.001.

**Table 1: Regression table**

Coefs | Std. Errors | tStat | Pvalue | |

Intercepts
Fuel_Price MarkDown1 MarkDown2 MarkDown3 MarkDown4 MarkDown5 CPI Unemployment Store_sqft |
-34045.475
4323.9719 -0.2546577 0.006237 -0.0311915 0.0444134 -0.7241132 35.039088 4416.708 0.2822359 |
6945.834
995.7053 0.093879 0.059163 0.047988 0.145924 0.121302 14.07946 582.68 0.123361 |
-4.90157
4.342622 -2.71262 0.10542 -0.64999 0.304361 -5.96953 2.488667 7.579989 2.287892 |
1.23E-06
1.66E-05 0.00687 0.916078 0.515951 0.760961 4.11E-09 0.013097 1.35E-13 0.022497 |

R-squared
F(10,127) |
0.1828
14.6658 | adjusted R-squared Significance F |
0.1704
1.46E-21 |

**Figure 1: Plot of predicted sales for 3/12/2010 for the 100 stores
**

The plot of the predicted weekly sales for 3/12/2010 is shown above, and Table 2 shows the descriptive statistics

**Table 2: Summary Statistics**

prediction | weekly sales | |

average
Median α Sample α2 Kurtosis Skewness Min Max Counts | 20644.665
20886.36392 3250.073465 10562977.53 0.419042457 -0.640121747 10322.26017 27185.29816 100 | 21228.292
21795.495 1307.006 1708264.720 -1.904 0.073 19928.220 22897.852 100 |

The summary statistics above show that average actual weekly sales are $21,228.29, while an average predicted weekly sale is $20,644.67. The variation in predicted values ($3250.07) is larger than that of weekly sales ($1307.00). The least actual sale is $19,928, while the least predicted actual sales are $10,322.26. The largest actual sale is $22,897.85, while the largest predicted sale is $27,185.3.

The objective was to test the hypothesis of difference in mean weekly sales of group 1 and group 2 where group 1 is weeks of 2/5, 2/12/ and 2/19, and group 2 is weeks of 2/26, 3/5, 3/12, the null and alternative hypothesis is presented below

H0: Xgroup1=Xgroup2

H1: Xgroup1Xgroup2

The t-test result is presented in table 3. First, the test of difference in variance shows that there isn't any significant difference in the variance in the two groups F=1.053, p=0.328. This means we can assume equal variance when estimating our t-statistics. The t-statistics result shows that the average predicted sales for group 1 ($20,551.53) is greater than average for group 2 ($20.494.18). However, the difference is not significant t(598)=0.184, p=0.854. This means that there is no significant difference between sales for group 1 and group 2.

**Table 3: Independent t-test result**

group 1 | group2 | |

average
α2 Observation Pooled α2 Degrees of freedom tStat P(T<=t) one-tails Criticals value P(T<=t) two-tails Criticals value |
20551.53
14943673 300 14567202 598 0.184052 0.427018 1.647406 0.854035 1.963939 |
20494.18
14190730 300 |

## Help with Data Analysis Homework on Fuel Price and its effect on Unemployment

It always helps in any **data analysis homework** to understand the variables that are present in the data. With the help of our diligent **experts**, we take this section to understand what variables are of interest in this data analysis task.

Endogeneity of Unemployment refers to a situation in which Unemployment as an explanatory variable is correlated with the error term. It may occur because of omitted variables or errors in measurement. If endogeneity is not dealt with, the model estimates will be biased as the assumption of the Gauss-Markov is violated.

There are two requirements for a variable to be regarded as a valid instrument. One is that it must be correlated to the endogenous variable. Second is that it mustn't be correlated to the error term in the explanatory equation conditionally on other covariates. Based on these two requirements of instruments, fuel price could make a good instrument because it is correlated with Unemployment in the sense that when there is a rise in fuel price, the cost of production for firms increases and their production quantity reduces, which implies less employment. Secondly, we expect fuel price not to affect directly weekly sales except through the fact that it is related to Unemployment, and Unemployment leads to low purchasing power, which will lead to lower sales. The correlation coefficient between fuel price and Unemployment is -0.429, which is a medium correlation. This means fuel prices and Unemployment are correlated, and fuel prices can be a good instrument.

The model is shown below

The first stage

*unemployment=α+γfuel_price+μ
*

Where is the intercept, is the slope, and μ is the error term from the first stage

The second stage involves regression of weekly sales on the fitted value of Unemployment from the first stage. The model for the second stage is thus.

*weekly_sales=λ+βunemployment+ε
*

Where is the intercept, is the slope, and ε is the error term from the second stage

The result of the two-stage least square is presented in table 4. The first stage result shows that fuel prices have a negative impact on Unemployment. A dollar increase in fuel price reduces Unemployment by 0.678%, and this effect is significant (p<0.0001). Fuel price explains 18.27% of the variation in Unemployment, and the F-test shows that the model is significant (p<0.0001). The second stage shows that the fitted Unemployment from the first stage has a positive but insignificant effect on predicted weekly sales (p=0.834). A % change in Unemployment will lead to a $119.27 increase in sales, but the percent in variation explained is 0.01%, and the f-test shows that the model is not significant (p=0.834).

**Table 4: Two-stage least-square result
**

**First Stage**

Coefs | Std. Errors |
tStat
| Pvalue | |

Intercepts Fuel_Price |
9.483127 -0.677448 |
0.188855 0.058326 |
50.21378 -11.6148 |
1.2E-216 2.9E-28 |

R Square
F |
0.1841
134.9029 |
Adjusted R Square
Significance F |
0.1827
2.9E-28 |

**Second Stage**

Intercept
fitted_unemployment |
19651.34
119.2752 |
4157.345
568.5691 |
4.726896
0.209781 |
2.85E-06
0.83391 |

R Square
F |
0.0001
0.044008 |
Adjusted R Square
Significance F |
-0.001599
0.83391 |

## Model Interpretation Demonstrated by our Online Data Analysis Tutors

The data generating model is

*inspection=0+1Temperature + α2Fuel price+ α3CPI+4CPI+5Unemployment+6weeklysales+7store sqft+8large city+9medium city
*

The result of the model is presented in table 5. The result shows that temperature has a positive but insignificant (p=0.85) effect on the probability of inspection. A degree increase in temperature increases the probability of inspection by approximately 0.38. This means there is a high probability of inspection during the hot period. Fuel prices have a negative but insignificant (p=0.501) effect on the probability of inspection. A dollar increase in fuel price reduces the probability of inspection by approximately 0.04. This means there is a lower probability of inspection during a period where fuel is costly. CPI has a negative but insignificant (p=0.574) effect on the probability of inspection. A rise in CPI reduces the probability of inspection by approximately 0.0005. This means there is a lower probability of inspection during a period of rising prices. Unemployment has a negative but insignificant (p=0.8857) effect on the probability of inspection. A percent increase in Unemployment reduces the probability of inspection by approximately 0.0054. This means there is a lower probability of inspection during a period of high Unemployment. Weekly sales have a positive and significant (p=0.05<0.1) effect on the probability of inspection. A dollar increase in weekly sales increases the probability of inspection by approximately 4.697e-06. This means there is a higher probability of inspection of the store where sales are high. Store square feet have a positive but insignificant (p=0.229) effect on the probability of inspection. A dollar square feet increase in-store size increases the probability of inspection by approximately 9.22e-06. This means there is a higher probability of inspection of the larger store. The probability of inspecting stores in a large city is significantly (p=0.013) higher than that of inspecting stores in a small city by 0.128, while the probability of inspecting stores in a medium city is significantly (p=0.099) higher than that of inspecting stores in a small city by 0.084.

**Table 5: Linear Probability Model Result**

Coeffs | Std. Errors | tStat | Pvalue | |

Intercept Temperature Fuel_Price CPI Unemployment weekly_sales Store_sqft Largecity MediumCity |
0.3774 0.0003 -0.0424 -0.0005 -0.0054 0.0000 0.0000 0.1279 0.0848 |
0.4328 0.0015 0.0630 0.0009 0.0374 0.0000 0.0000 0.0512 0.0513 |
0.8720 0.1920 -0.6731 -0.6021 -0.1438 1.9638 1.2035 2.4973 1.6534 |
0.3836 0.8478 0.5012 0.5474 0.8857 0.0500 0.2293 0.0128 0.0988 |

The shortcoming of the model is that predicted probabilities can be lesser than 0 or greater than one, which is contrary to what is known from elementary statistics that probability ranges between 0 and 1.

The result tells us that only store square feet are a significant predictor of odds of inspection, while all other variables are not significant. A dollar increase in weekly sales increases the log odds of inspection by 1.47e-05. A square feet increase in-store size increases log odds of inspection by 6.13e-05. A dollar increase in fuel price reduces the log odds of inspection by 0.121. A unit increase in CPI reduces log odds of inspection by 0.0026, and a percent increase in Unemployment increases log odds of inspection by 0.938

The data generating process is

l=log p1-p =-0.836+0.000015weeklysales+0.000061store sqft-0.122fuel price-0.0026CPI+0.0129Unemployment

The average fuel price is 3.212528.

Probability at this average is

pfuelprice=11+eβ*fuel price

pfuelprice=11+e-0.121675*3.212528

pfuelprice=0.596

For a dollar increase in fuel

pfuelprice=11+eβ*fuel price

pfuelprice=11+e-0.121675*4.212528

pfuelprice=0.625

The change in probability is 0.625-0.596=0.029.

Therefore, the conclusion of the** online data analysis tutor** was that an increase of $1 of Fuel Price would increase the probability of an inspection 0.029