Table Of Contents
 Assistance with Data Analysis Assignment involving a Model for Prediction Purpose
 Help with Data Analysis Homework on Fuel Price and its effect on Unemployment
 Model Interpretation Demonstrated by our Online Data Analysis Tutors
Assistance with Data Analysis Assignment involving a Model for Prediction Purpose
In order to get a good model with high predictive power that will we be of great help in solving this data analysis assignment, I decided to start with a full model including all the variables, then using a backward selection process to delete a nonsignificant variable (when pvalue greater than 0.1) from the model. The initial model is
Since markdown 15 must be in the final model, I will not consider them for removal. My result shows that only the medium city is not significant (p=0.135). Therefore, I will remove it from the model.
Reestimating the model, I see that city size (large city) becomes insignificant to have the largest pvalue (p=0.103); therefore, I removed it from the model. After reestimation, we see that temperature is the next insignificant variable with the largest pvalue (p=0.1088); thus, I removed it and reestimate the model. After reestimation, we see that inspection is the next insignificant variable with the largest pvalue (p=0.1088). Thus, I removed it and reestimate the model
The result shows that, apart from the markdown variables, all other variables are statistically significant. The estimated model is thus;
The result is summarized in the regression table below. The result shows that fuel prices have a significant positive effect on weekly sales. A dollar increases in fuel price increases weekly sales by $4323.97 (p<0.001). Markdown 1 has a significant negative effect on weekly sales. A dollar increase in markdown1 reduces sales by $0.25 (p=0.0069). Markdown 2 has a positive but insignificant effect on weekly sales, a dollar increase in markdown2 increases sales by $0.0046 (p=0.916). Markdown 3 has a negative but insignificant effect on weekly sales. A dollar increase in markdown3 reduces sales by $0.03 (p=0.516). Markdown 4 has a positive but insignificant effect on weekly sales, a dollar increase in markdown4 increases sales by $0.04 (p=0.761). Markdown 5 has a significant negative effect on weekly sales. A dollar increase in markdown5 reduces sales by $0.72 (p<0.001). CPI has a positive and significant effect on weekly sales. A unit increase in CPI increases sales by $35.03 (p=0.013). Unemployment has a positive and significant effect on weekly sales. A percent increase in Unemployment increases sales by $4416.71 (p<0.001). Store size has a positive and significant effect on weekly sales. A square feet increase instore size increases sales by $0.28 (p=0.038).
The rsquared shows that the independent variables in this model explain the 17.04% of variation found in the model. The fisher test shows that the overall model is statistically significant; F(10,127)=14.67, p<0.001.
Table 1: Regression table
 Coefs  Std. Errors  tStat  Pvalue 
Intercepts
Fuel_Price
MarkDown1
MarkDown2
MarkDown3
MarkDown4
MarkDown5
CPI
Unemployment
Store_sqft

34045.475
4323.9719
0.2546577
0.006237
0.0311915
0.0444134
0.7241132
35.039088
4416.708
0.2822359

6945.834
995.7053
0.093879
0.059163
0.047988
0.145924
0.121302
14.07946
582.68
0.123361

4.90157
4.342622
2.71262
0.10542
0.64999
0.304361
5.96953
2.488667
7.579989
2.287892

1.23E06
1.66E05
0.00687
0.916078
0.515951
0.760961
4.11E09
0.013097
1.35E13
0.022497

Rsquared
F(10,127)

0.1828
14.6658
 adjusted Rsquared Significance F


0.1704
1.46E21

Figure 1: Plot of predicted sales for 3/12/2010 for the 100 stores
The plot of the predicted weekly sales for 3/12/2010 is shown above, and Table 2 shows the descriptive statistics
Table 2: Summary Statistics
 prediction  weekly sales 
average
Median
α
Sample α2
Kurtosis
Skewness
Min
Max
Counts
 20644.665
20886.36392
3250.073465
10562977.53
0.419042457
0.640121747
10322.26017
27185.29816
100
 21228.292
21795.495
1307.006
1708264.720
1.904
0.073
19928.220
22897.852
100

The summary statistics above show that average actual weekly sales are $21,228.29, while an average predicted weekly sale is $20,644.67. The variation in predicted values ($3250.07) is larger than that of weekly sales ($1307.00). The least actual sale is $19,928, while the least predicted actual sales are $10,322.26. The largest actual sale is $22,897.85, while the largest predicted sale is $27,185.3.
The objective was to test the hypothesis of difference in mean weekly sales of group 1 and group 2 where group 1 is weeks of 2/5, 2/12/ and 2/19, and group 2 is weeks of 2/26, 3/5, 3/12, the null and alternative hypothesis is presented below
H0: Xgroup1=Xgroup2
H1: Xgroup1Xgroup2
The ttest result is presented in table 3. First, the test of difference in variance shows that there isn't any significant difference in the variance in the two groups F=1.053, p=0.328. This means we can assume equal variance when estimating our tstatistics. The tstatistics result shows that the average predicted sales for group 1 ($20,551.53) is greater than average for group 2 ($20.494.18). However, the difference is not significant t(598)=0.184, p=0.854. This means that there is no significant difference between sales for group 1 and group 2.
Table 3: Independent ttest result
 group 1  group2 
average
α2
Observation
Pooled α2
Degrees of freedom
tStat
P(T<=t) onetails
Criticals value
P(T<=t) twotails
Criticals value

20551.53
14943673
300
14567202
598
0.184052
0.427018
1.647406
0.854035
1.963939

20494.18
14190730
300

Help with Data Analysis Homework on Fuel Price and its effect on Unemployment
It always helps in any data analysis homework to understand the variables that are present in the data. With the help of our diligent experts, we take this section to understand what variables are of interest in this data analysis task.
Endogeneity of Unemployment refers to a situation in which Unemployment as an explanatory variable is correlated with the error term. It may occur because of omitted variables or errors in measurement. If endogeneity is not dealt with, the model estimates will be biased as the assumption of the GaussMarkov is violated.
There are two requirements for a variable to be regarded as a valid instrument. One is that it must be correlated to the endogenous variable. Second is that it mustn't be correlated to the error term in the explanatory equation conditionally on other covariates. Based on these two requirements of instruments, fuel price could make a good instrument because it is correlated with Unemployment in the sense that when there is a rise in fuel price, the cost of production for firms increases and their production quantity reduces, which implies less employment. Secondly, we expect fuel price not to affect directly weekly sales except through the fact that it is related to Unemployment, and Unemployment leads to low purchasing power, which will lead to lower sales. The correlation coefficient between fuel price and Unemployment is 0.429, which is a medium correlation. This means fuel prices and Unemployment are correlated, and fuel prices can be a good instrument.
The model is shown below
The first stage
unemployment=α+γfuel_price+μ
Where is the intercept, is the slope, and μ is the error term from the first stage
The second stage involves regression of weekly sales on the fitted value of Unemployment from the first stage. The model for the second stage is thus.
weekly_sales=λ+βunemployment+ε
Where is the intercept, is the slope, and ε is the error term from the second stage
The result of the twostage least square is presented in table 4. The first stage result shows that fuel prices have a negative impact on Unemployment. A dollar increase in fuel price reduces Unemployment by 0.678%, and this effect is significant (p<0.0001). Fuel price explains 18.27% of the variation in Unemployment, and the Ftest shows that the model is significant (p<0.0001). The second stage shows that the fitted Unemployment from the first stage has a positive but insignificant effect on predicted weekly sales (p=0.834). A % change in Unemployment will lead to a $119.27 increase in sales, but the percent in variation explained is 0.01%, and the ftest shows that the model is not significant (p=0.834).
Table 4: Twostage leastsquare result
First Stage
 Coefs  Std. Errors 
tStat
 Pvalue 
Intercepts
Fuel_Price

9.483127
0.677448

0.188855
0.058326

50.21378
11.6148

1.2E216
2.9E28

R Square
F

0.1841
134.9029

Adjusted R Square
Significance F


0.1827
2.9E28

Second Stage
Intercept
fitted_unemployment

19651.34
119.2752

4157.345
568.5691

4.726896
0.209781

2.85E06
0.83391

R Square
F

0.0001
0.044008

Adjusted R Square
Significance F


0.001599
0.83391

Model Interpretation Demonstrated by our Online Data Analysis Tutors
The data generating model is
inspection=0+1Temperature + α2Fuel price+ α3CPI+4CPI+5Unemployment+6weeklysales+7store sqft+8large city+9medium city
The result of the model is presented in table 5. The result shows that temperature has a positive but insignificant (p=0.85) effect on the probability of inspection. A degree increase in temperature increases the probability of inspection by approximately 0.38. This means there is a high probability of inspection during the hot period. Fuel prices have a negative but insignificant (p=0.501) effect on the probability of inspection. A dollar increase in fuel price reduces the probability of inspection by approximately 0.04. This means there is a lower probability of inspection during a period where fuel is costly. CPI has a negative but insignificant (p=0.574) effect on the probability of inspection. A rise in CPI reduces the probability of inspection by approximately 0.0005. This means there is a lower probability of inspection during a period of rising prices. Unemployment has a negative but insignificant (p=0.8857) effect on the probability of inspection. A percent increase in Unemployment reduces the probability of inspection by approximately 0.0054. This means there is a lower probability of inspection during a period of high Unemployment. Weekly sales have a positive and significant (p=0.05<0.1) effect on the probability of inspection. A dollar increase in weekly sales increases the probability of inspection by approximately 4.697e06. This means there is a higher probability of inspection of the store where sales are high. Store square feet have a positive but insignificant (p=0.229) effect on the probability of inspection. A dollar square feet increase instore size increases the probability of inspection by approximately 9.22e06. This means there is a higher probability of inspection of the larger store. The probability of inspecting stores in a large city is significantly (p=0.013) higher than that of inspecting stores in a small city by 0.128, while the probability of inspecting stores in a medium city is significantly (p=0.099) higher than that of inspecting stores in a small city by 0.084.
Table 5: Linear Probability Model Result
 Coeffs  Std. Errors  tStat  Pvalue 
Intercept
Temperature
Fuel_Price
CPI
Unemployment
weekly_sales
Store_sqft
Largecity
MediumCity

0.3774
0.0003
0.0424
0.0005
0.0054
0.0000
0.0000
0.1279
0.0848

0.4328
0.0015
0.0630
0.0009
0.0374
0.0000
0.0000
0.0512
0.0513

0.8720
0.1920
0.6731
0.6021
0.1438
1.9638
1.2035
2.4973
1.6534

0.3836
0.8478
0.5012
0.5474
0.8857
0.0500
0.2293
0.0128
0.0988

The shortcoming of the model is that predicted probabilities can be lesser than 0 or greater than one, which is contrary to what is known from elementary statistics that probability ranges between 0 and 1.
The result tells us that only store square feet are a significant predictor of odds of inspection, while all other variables are not significant. A dollar increase in weekly sales increases the log odds of inspection by 1.47e05. A square feet increase instore size increases log odds of inspection by 6.13e05. A dollar increase in fuel price reduces the log odds of inspection by 0.121. A unit increase in CPI reduces log odds of inspection by 0.0026, and a percent increase in Unemployment increases log odds of inspection by 0.938
The data generating process is
l=log p1p =0.836+0.000015weeklysales+0.000061store sqft0.122fuel price0.0026CPI+0.0129Unemployment
The average fuel price is 3.212528.
Probability at this average is
pfuelprice=11+eβ*fuel price
pfuelprice=11+e0.121675*3.212528
pfuelprice=0.596
For a dollar increase in fuel
pfuelprice=11+eβ*fuel price
pfuelprice=11+e0.121675*4.212528
pfuelprice=0.625
The change in probability is 0.6250.596=0.029.
Therefore, the conclusion of the online data analysis tutor was that an increase of $1 of Fuel Price would increase the probability of an inspection 0.029