Assistance with Data Analysis Assignment involving a Model for Prediction Purpose

In order to get a good model with high predictive power that will we be of great help in solving this data analysis assignment, I decided to start with a full model including all the variables, then using a backward selection process to delete a non-significant variable (when p-value greater than 0.1) from the model. The initial model is



Since markdown 1-5 must be in the final model, I will not consider them for removal. My result shows that only the medium city is not significant (p=0.135). Therefore, I will remove it from the model.

Re-estimating the model, I see that city size (large city) becomes insignificant to have the largest p-value (p=0.103); therefore, I removed it from the model. After re-estimation, we see that temperature is the next insignificant variable with the largest p-value (p=0.1088); thus, I removed it and re-estimate the model. After re-estimation, we see that inspection is the next insignificant variable with the largest p-value (p=0.1088). Thus, I removed it and re-estimate the model

The result shows that, apart from the markdown variables, all other variables are statistically significant. The estimated model is thus;



The result is summarized in the regression table below. The result shows that fuel prices have a significant positive effect on weekly sales. A dollar increases in fuel price increases weekly sales by $4323.97 (p<0.001). Markdown 1 has a significant negative effect on weekly sales. A dollar increase in markdown1 reduces sales by $0.25 (p=0.0069). Markdown 2 has a positive but insignificant effect on weekly sales, a dollar increase in markdown2 increases sales by $0.0046 (p=0.916). Markdown 3 has a negative but insignificant effect on weekly sales. A dollar increase in markdown3 reduces sales by $0.03 (p=0.516). Markdown 4 has a positive but insignificant effect on weekly sales, a dollar increase in markdown4 increases sales by $0.04 (p=0.761). Markdown 5 has a significant negative effect on weekly sales. A dollar increase in markdown5 reduces sales by $0.72 (p<0.001). CPI has a positive and significant effect on weekly sales. A unit increase in CPI increases sales by $35.03 (p=0.013). Unemployment has a positive and significant effect on weekly sales. A percent increase in Unemployment increases sales by $4416.71 (p<0.001). Store size has a positive and significant effect on weekly sales. A square feet increase in-store size increases sales by $0.28 (p=0.038).

The r-squared shows that the independent variables in this model explain the 17.04% of variation found in the model. The fisher test shows that the overall model is statistically significant; F(10,127)=14.67, p<0.001.

Table 1: Regression table


CoefsStd. ErrorstStatPvalue
Intercepts
Fuel_Price
MarkDown1
MarkDown2
MarkDown3
MarkDown4
MarkDown5
CPI
Unemployment
Store_sqft
-34045.475
4323.9719
-0.2546577
0.006237
-0.0311915
0.0444134
-0.7241132
35.039088
4416.708
0.2822359
6945.834
995.7053
0.093879
0.059163
0.047988
0.145924
0.121302
14.07946
582.68
0.123361
-4.90157
4.342622
-2.71262
0.10542
-0.64999
0.304361
-5.96953
2.488667
7.579989
2.287892
1.23E-06
1.66E-05
0.00687
0.916078
0.515951
0.760961
4.11E-09
0.013097
1.35E-13
0.022497
R-squared
F(10,127)
0.1828
14.6658

adjusted R-squared 

Significance F


0.1704
1.46E-21

Data Analysis for Sales Prediction

Figure 1: Plot of predicted sales for 3/12/2010 for the 100 stores

The plot of the predicted weekly sales for 3/12/2010 is shown above, and Table 2 shows the descriptive statistics

Table 2: Summary Statistics


predictionweekly sales
average
Median
α
Sample α2
Kurtosis
Skewness
Min
Max
Counts
20644.665
20886.36392
3250.073465
10562977.53
0.419042457
-0.640121747
10322.26017
27185.29816
100
21228.292
21795.495
1307.006
1708264.720
-1.904
0.073
19928.220
22897.852
100

The summary statistics above show that average actual weekly sales are $21,228.29, while an average predicted weekly sale is $20,644.67. The variation in predicted values ($3250.07) is larger than that of weekly sales ($1307.00). The least actual sale is $19,928, while the least predicted actual sales are $10,322.26. The largest actual sale is $22,897.85, while the largest predicted sale is $27,185.3.

The objective was to test the hypothesis of difference in mean weekly sales of group 1 and group 2 where group 1 is weeks of 2/5, 2/12/ and 2/19, and group 2 is weeks of 2/26, 3/5, 3/12, the null and alternative hypothesis is presented below

H0: Xgroup1=Xgroup2

H1: Xgroup1Xgroup2

The t-test result is presented in table 3. First, the test of difference in variance shows that there isn't any significant difference in the variance in the two groups F=1.053, p=0.328. This means we can assume equal variance when estimating our t-statistics. The t-statistics result shows that the average predicted sales for group 1 ($20,551.53) is greater than average for group 2 ($20.494.18). However, the difference is not significant t(598)=0.184, p=0.854. This means that there is no significant difference between sales for group 1 and group 2.

Table 3: Independent t-test result


group 1group2
average
α2
Observation
Pooled α2
Degrees of freedom
tStat
P(T<=t) one-tails
Criticals value
P(T<=t) two-tails
Criticals value
20551.53
14943673
300
14567202
598
0.184052
0.427018
1.647406
0.854035
1.963939
20494.18
14190730
300

Help with Data Analysis Homework on Fuel Price and its effect on Unemployment

It always helps in any data analysis homework to understand the variables that are present in the data. With the help of our diligent experts, we take this section to understand what variables are of interest in this data analysis task.

Endogeneity of Unemployment refers to a situation in which Unemployment as an explanatory variable is correlated with the error term. It may occur because of omitted variables or errors in measurement. If endogeneity is not dealt with, the model estimates will be biased as the assumption of the Gauss-Markov is violated.

There are two requirements for a variable to be regarded as a valid instrument. One is that it must be correlated to the endogenous variable. Second is that it mustn't be correlated to the error term in the explanatory equation conditionally on other covariates. Based on these two requirements of instruments, fuel price could make a good instrument because it is correlated with Unemployment in the sense that when there is a rise in fuel price, the cost of production for firms increases and their production quantity reduces, which implies less employment. Secondly, we expect fuel price not to affect directly weekly sales except through the fact that it is related to Unemployment, and Unemployment leads to low purchasing power, which will lead to lower sales. The correlation coefficient between fuel price and Unemployment is -0.429, which is a medium correlation. This means fuel prices and Unemployment are correlated, and fuel prices can be a good instrument.

The model is shown below

The first stage

unemployment=α+γfuel_price+μ

Where is the intercept, is the slope, and μ is the error term from the first stage

The second stage involves regression of weekly sales on the fitted value of Unemployment from the first stage. The model for the second stage is thus.

weekly_sales=λ+βunemployment+ε

Where is the intercept, is the slope, and ε is the error term from the second stage

The result of the two-stage least square is presented in table 4. The first stage result shows that fuel prices have a negative impact on Unemployment. A dollar increase in fuel price reduces Unemployment by 0.678%, and this effect is significant (p<0.0001). Fuel price explains 18.27% of the variation in Unemployment, and the F-test shows that the model is significant (p<0.0001). The second stage shows that the fitted Unemployment from the first stage has a positive but insignificant effect on predicted weekly sales (p=0.834). A % change in Unemployment will lead to a $119.27 increase in sales, but the percent in variation explained is 0.01%, and the f-test shows that the model is not significant (p=0.834).

Table 4: Two-stage least-square result

First Stage


CoefsStd. Errors tStat
Pvalue

Intercepts
Fuel_Price


9.483127
-0.677448


0.188855
0.058326


50.21378
-11.6148


1.2E-216
2.9E-28

R Square
F
0.1841
134.9029
Adjusted R Square
Significance F

0.1827
2.9E-28

Second Stage

Intercept
fitted_unemployment
19651.34
119.2752
4157.345
568.5691
4.726896
0.209781
2.85E-06
0.83391
R Square
F
0.0001
0.044008
Adjusted R Square
Significance F

-0.001599
0.83391

Model Interpretation Demonstrated by our Online Data Analysis Tutors

The data generating model is

inspection=0+1Temperature + α2Fuel price+ α3CPI+4CPI+5Unemployment+6weeklysales+7store sqft+8large city+9medium city

The result of the model is presented in table 5. The result shows that temperature has a positive but insignificant (p=0.85) effect on the probability of inspection. A degree increase in temperature increases the probability of inspection by approximately 0.38. This means there is a high probability of inspection during the hot period. Fuel prices have a negative but insignificant (p=0.501) effect on the probability of inspection. A dollar increase in fuel price reduces the probability of inspection by approximately 0.04. This means there is a lower probability of inspection during a period where fuel is costly. CPI has a negative but insignificant (p=0.574) effect on the probability of inspection. A rise in CPI reduces the probability of inspection by approximately 0.0005. This means there is a lower probability of inspection during a period of rising prices. Unemployment has a negative but insignificant (p=0.8857) effect on the probability of inspection. A percent increase in Unemployment reduces the probability of inspection by approximately 0.0054. This means there is a lower probability of inspection during a period of high Unemployment. Weekly sales have a positive and significant (p=0.05<0.1) effect on the probability of inspection. A dollar increase in weekly sales increases the probability of inspection by approximately 4.697e-06. This means there is a higher probability of inspection of the store where sales are high. Store square feet have a positive but insignificant (p=0.229) effect on the probability of inspection. A dollar square feet increase in-store size increases the probability of inspection by approximately 9.22e-06. This means there is a higher probability of inspection of the larger store. The probability of inspecting stores in a large city is significantly (p=0.013) higher than that of inspecting stores in a small city by 0.128, while the probability of inspecting stores in a medium city is significantly (p=0.099) higher than that of inspecting stores in a small city by 0.084.

Table 5: Linear Probability Model Result


CoeffsStd. ErrorstStatPvalue

Intercept
Temperature
Fuel_Price
CPI
Unemployment
weekly_sales
Store_sqft
Largecity
MediumCity


0.3774
0.0003
-0.0424
-0.0005
-0.0054
0.0000
0.0000
0.1279
0.0848


0.4328
0.0015
0.0630
0.0009
0.0374
0.0000
0.0000
0.0512
0.0513


0.8720
0.1920
-0.6731
-0.6021
-0.1438
1.9638
1.2035
2.4973
1.6534


0.3836
0.8478
0.5012
0.5474
0.8857
0.0500
0.2293
0.0128
0.0988

The shortcoming of the model is that predicted probabilities can be lesser than 0 or greater than one, which is contrary to what is known from elementary statistics that probability ranges between 0 and 1.

The result tells us that only store square feet are a significant predictor of odds of inspection, while all other variables are not significant. A dollar increase in weekly sales increases the log odds of inspection by 1.47e-05. A square feet increase in-store size increases log odds of inspection by 6.13e-05. A dollar increase in fuel price reduces the log odds of inspection by 0.121. A unit increase in CPI reduces log odds of inspection by 0.0026, and a percent increase in Unemployment increases log odds of inspection by 0.938

The data generating process is

l=log p1-p =-0.836+0.000015weeklysales+0.000061store sqft-0.122fuel price-0.0026CPI+0.0129Unemployment

The average fuel price is 3.212528.

Probability at this average is

pfuelprice=11+eβ*fuel price

pfuelprice=11+e-0.121675*3.212528

pfuelprice=0.596

For a dollar increase in fuel

pfuelprice=11+eβ*fuel price

pfuelprice=11+e-0.121675*4.212528

pfuelprice=0.625

The change in probability is 0.625-0.596=0.029.

Therefore, the conclusion of the online data analysis tutor was that an increase of $1 of Fuel Price would increase the probability of an inspection 0.029