- Data Analysis
Assistance with Data Analysis Assignment involving a Model for Prediction Purpose
In order to get a good model with high predictive power that will we be of great help in solving this data analysis assignment, I decided to start with a full model including all the variables, then using a backward selection process to delete a non-significant variable (when p-value greater than 0.1) from the model. The initial model is
Since markdown 1-5 must be in the final model, I will not consider them for removal. My result shows that only the medium city is not significant (p=0.135). Therefore, I will remove it from the model.
Re-estimating the model, I see that city size (large city) becomes insignificant to have the largest p-value (p=0.103); therefore, I removed it from the model. After re-estimation, we see that temperature is the next insignificant variable with the largest p-value (p=0.1088); thus, I removed it and re-estimate the model. After re-estimation, we see that inspection is the next insignificant variable with the largest p-value (p=0.1088). Thus, I removed it and re-estimate the model
The result shows that, apart from the markdown variables, all other variables are statistically significant. The estimated model is thus;
The result is summarized in the regression table below. The result shows that fuel prices have a significant positive effect on weekly sales. A dollar increases in fuel price increases weekly sales by $4323.97 (p<0.001). Markdown 1 has a significant negative effect on weekly sales. A dollar increase in markdown1 reduces sales by $0.25 (p=0.0069). Markdown 2 has a positive but insignificant effect on weekly sales, a dollar increase in markdown2 increases sales by $0.0046 (p=0.916). Markdown 3 has a negative but insignificant effect on weekly sales. A dollar increase in markdown3 reduces sales by $0.03 (p=0.516). Markdown 4 has a positive but insignificant effect on weekly sales, a dollar increase in markdown4 increases sales by $0.04 (p=0.761). Markdown 5 has a significant negative effect on weekly sales. A dollar increase in markdown5 reduces sales by $0.72 (p<0.001). CPI has a positive and significant effect on weekly sales. A unit increase in CPI increases sales by $35.03 (p=0.013). Unemployment has a positive and significant effect on weekly sales. A percent increase in Unemployment increases sales by $4416.71 (p<0.001). Store size has a positive and significant effect on weekly sales. A square feet increase in-store size increases sales by $0.28 (p=0.038).
The r-squared shows that the independent variables in this model explain the 17.04% of variation found in the model. The fisher test shows that the overall model is statistically significant; F(10,127)=14.67, p<0.001.
Table 1: Regression table
Figure 1: Plot of predicted sales for 3/12/2010 for the 100 stores
The plot of the predicted weekly sales for 3/12/2010 is shown above, and Table 2 shows the descriptive statistics
Table 2: Summary Statistics
The summary statistics above show that average actual weekly sales are $21,228.29, while an average predicted weekly sale is $20,644.67. The variation in predicted values ($3250.07) is larger than that of weekly sales ($1307.00). The least actual sale is $19,928, while the least predicted actual sales are $10,322.26. The largest actual sale is $22,897.85, while the largest predicted sale is $27,185.3.
The objective was to test the hypothesis of difference in mean weekly sales of group 1 and group 2 where group 1 is weeks of 2/5, 2/12/ and 2/19, and group 2 is weeks of 2/26, 3/5, 3/12, the null and alternative hypothesis is presented below
The t-test result is presented in table 3. First, the test of difference in variance shows that there isn't any significant difference in the variance in the two groups F=1.053, p=0.328. This means we can assume equal variance when estimating our t-statistics. The t-statistics result shows that the average predicted sales for group 1 ($20,551.53) is greater than average for group 2 ($20.494.18). However, the difference is not significant t(598)=0.184, p=0.854. This means that there is no significant difference between sales for group 1 and group 2.
Table 3: Independent t-test result
Degrees of freedom
Help with Data Analysis Homework on Fuel Price and its effect on Unemployment
It always helps in any data analysis homework to understand the variables that are present in the data. With the help of our diligent experts, we take this section to understand what variables are of interest in this data analysis task.
Endogeneity of Unemployment refers to a situation in which Unemployment as an explanatory variable is correlated with the error term. It may occur because of omitted variables or errors in measurement. If endogeneity is not dealt with, the model estimates will be biased as the assumption of the Gauss-Markov is violated.
There are two requirements for a variable to be regarded as a valid instrument. One is that it must be correlated to the endogenous variable. Second is that it mustn't be correlated to the error term in the explanatory equation conditionally on other covariates. Based on these two requirements of instruments, fuel price could make a good instrument because it is correlated with Unemployment in the sense that when there is a rise in fuel price, the cost of production for firms increases and their production quantity reduces, which implies less employment. Secondly, we expect fuel price not to affect directly weekly sales except through the fact that it is related to Unemployment, and Unemployment leads to low purchasing power, which will lead to lower sales. The correlation coefficient between fuel price and Unemployment is -0.429, which is a medium correlation. This means fuel prices and Unemployment are correlated, and fuel prices can be a good instrument.
The model is shown below
The first stage
Where is the intercept, is the slope, and μ is the error term from the first stage
The second stage involves regression of weekly sales on the fitted value of Unemployment from the first stage. The model for the second stage is thus.
Where is the intercept, is the slope, and ε is the error term from the second stage
The result of the two-stage least square is presented in table 4. The first stage result shows that fuel prices have a negative impact on Unemployment. A dollar increase in fuel price reduces Unemployment by 0.678%, and this effect is significant (p<0.0001). Fuel price explains 18.27% of the variation in Unemployment, and the F-test shows that the model is significant (p<0.0001). The second stage shows that the fitted Unemployment from the first stage has a positive but insignificant effect on predicted weekly sales (p=0.834). A % change in Unemployment will lead to a $119.27 increase in sales, but the percent in variation explained is 0.01%, and the f-test shows that the model is not significant (p=0.834).
Table 4: Two-stage least-square result
Adjusted R Square
Adjusted R Square
Model Interpretation Demonstrated by our Online Data Analysis Tutors
The data generating model is
inspection=0+1Temperature + α2Fuel price+ α3CPI+4CPI+5Unemployment+6weeklysales+7store sqft+8large city+9medium city
The result of the model is presented in table 5. The result shows that temperature has a positive but insignificant (p=0.85) effect on the probability of inspection. A degree increase in temperature increases the probability of inspection by approximately 0.38. This means there is a high probability of inspection during the hot period. Fuel prices have a negative but insignificant (p=0.501) effect on the probability of inspection. A dollar increase in fuel price reduces the probability of inspection by approximately 0.04. This means there is a lower probability of inspection during a period where fuel is costly. CPI has a negative but insignificant (p=0.574) effect on the probability of inspection. A rise in CPI reduces the probability of inspection by approximately 0.0005. This means there is a lower probability of inspection during a period of rising prices. Unemployment has a negative but insignificant (p=0.8857) effect on the probability of inspection. A percent increase in Unemployment reduces the probability of inspection by approximately 0.0054. This means there is a lower probability of inspection during a period of high Unemployment. Weekly sales have a positive and significant (p=0.05<0.1) effect on the probability of inspection. A dollar increase in weekly sales increases the probability of inspection by approximately 4.697e-06. This means there is a higher probability of inspection of the store where sales are high. Store square feet have a positive but insignificant (p=0.229) effect on the probability of inspection. A dollar square feet increase in-store size increases the probability of inspection by approximately 9.22e-06. This means there is a higher probability of inspection of the larger store. The probability of inspecting stores in a large city is significantly (p=0.013) higher than that of inspecting stores in a small city by 0.128, while the probability of inspecting stores in a medium city is significantly (p=0.099) higher than that of inspecting stores in a small city by 0.084.
Table 5: Linear Probability Model Result
The shortcoming of the model is that predicted probabilities can be lesser than 0 or greater than one, which is contrary to what is known from elementary statistics that probability ranges between 0 and 1.
The result tells us that only store square feet are a significant predictor of odds of inspection, while all other variables are not significant. A dollar increase in weekly sales increases the log odds of inspection by 1.47e-05. A square feet increase in-store size increases log odds of inspection by 6.13e-05. A dollar increase in fuel price reduces the log odds of inspection by 0.121. A unit increase in CPI reduces log odds of inspection by 0.0026, and a percent increase in Unemployment increases log odds of inspection by 0.938
The data generating process is
l=log p1-p =-0.836+0.000015weeklysales+0.000061store sqft-0.122fuel price-0.0026CPI+0.0129Unemployment
The average fuel price is 3.212528.
Probability at this average is
For a dollar increase in fuel
The change in probability is 0.625-0.596=0.029.
Therefore, the conclusion of the online data analysis tutor was that an increase of $1 of Fuel Price would increase the probability of an inspection 0.029