• Matrix of correlation
• Why is it interesting for this data set to consider a model of order 4 relative to quarterly statistics?
• ANOVA Analysis

# Quantitative analysis

Quantitative analysis is the process of collecting measurable data and evaluating the data. The data can be market shares, revenues, and wages. It helps people understand the performance of their businesses. In the early days, people relied on experience during decision making. However, in the current days, people use quantitative analysis to know performance and make decisions.

## Matrix of correlation

 NCARREG IMPORTS RATE NCARREG 1 IMPORTS 0.653606 1 RATE -0.69246 -0.90112 1

There is a high correlation between imports and registered cars, with an increase in the number of registered cars there equates to an increase in import value as the time period heighten from 1992q1 2018 q1.On the contrary, rates display a negative correlation with both new registered cars and imports. Which entirely means for every unit change in either Imports or Registered cars there results in a change in rates in the opposite direction, which has been a trend over the time period on the study?
This assertion can further be evident by the time series graphs below,
Estimate a regression model of the following form for the period 1992Q1 to 2018Q1:
Regression Statistics
 Multiple R 0.69714 R Square 0.486004 Adjusted R Square 0.470736 Standard Error 7687.81 Observations 105

 Coefficients Standard Error t Stat P-value Intercept 39302.54 9449.291 4.159311 6.71E-05 IMPORTS 14.2145 85.54235 0.166169 0.868356 RATE -1889.57 1068.645 -1.7682 0.080048 TREND 78.22917 130.3307 0.600236 0.549693

NCARREG= 39302.54+ 14.2145IMPORT-1889.57RATE+78.22917TREND
New registrations of cars change by a unit resulted in a change directly in imports by a factor 14.2145 and trend by 78.23, but inversely with rate by a factor 1889.57.
To improve the model dynamics are introduced. The two variables to explain the car registrations are lagged by 4 periods equivalent to a year. The model to be estimated for the period 1993Q1 to 2018Q1 is now of the form:
Regression Statistics
 Multiple R 0.679677 R Square 0.461961 Adjusted R Square 0.445321 Standard Error 7695.47 Observations 101

 Coefficients Standard Error t Stat P-value Intercept 49344.56 10777.07 4.578662 1.39E-05 TREND 41.77199 146.8604 0.284433 0.776685 IMPORTS LAG -25.4673 92.8713 -0.27422 0.784497 RATELAG -2799.19 1243.621 -2.25084 0.026653

NCARREG= 49344.56-25.4673IMPORT-2799.19RATE+41.7719TREND
Lagging the period’s results in a relationship where New registrations of cars display an indirect relationship with both import and rate, but a direct relationship with the trend,.
Conduct a model control and investigate for the presence of autocorrelation by use of the Durbin Watson test. Save the residuals from this regression and call this variable RESID. Comment on the sign of the included variables. Assume a 10 percent level of significance. Compare also the model found under question B
 SUM OF SQUARED RESIDUALS 5.96934e+09 SUM OF SQUARED DIFF RESIDUALS 4.70026e+09 DURBIN WATSON 0.7874

The Durbin Watson statistic displays a positive presence of autocorrelation, the similarity between observations as a function of the time lag between them is thus high.
The model can explain variability of up to 67% within the data set compared to the model in B which explained up to 69% of the variability within the data set.
Expand the initial model above to an autoregressive (AR) model of order 4. This is an AR-4 model for the period 1994Q1 to 2018Q1. Estimate a model of the form
SUMMARY OUTPUT
Regression Statistics
 Multiple R 0.838416 R Square 0.702941 Adjusted R Square 0.679039 Standard Error 5443.02 Observations 95

 Coefficients Standard Error t Stat P-value Intercept 47268.57699 7963.301 5.935802 5.83E-08 IMPORTS LAG -58.76740547 67.89612 -0.86555 0.389118 RATELAG -2263.925707 925.0793 -2.44728 0.016404 t-1 0.368681017 0.096027 3.839331 0.000234 t-2 0.025210739 0.10695 0.235724 0.814201 t-3 -0.028588272 0.106814 -0.26765 0.789605 t-4 0.475936119 0.097676 4.87262 4.9E-06 Trend 113.0254443 107.3485 1.052883 0.29531

NCARREG= 47268.57-58.767 IMPORT-2263.92RATE+113.025TREND+0.368RESID(t-1)+0.0252RESID(t-2)-0.0285RESID(t-3)+0.4759RESID(t-4)

## Why is it interesting for this data set to consider a model of order 4 relative to quarterly statistics?

This is because an explanation of high variance in the data can be achieved at 83.84%.
 SUM OF SQUARED DIFFERENCES 3.5358e+09 SUM OF SQUARED RESIDUALS 2.5775e+09 DURBIN WATSON STAT 0.728973

The Durbin Watson statistic displays a positive presence of autocorrelation, the similarity between observations as a function of the time lag between them is thus high.
RESIDUAL PLOT

1E
 Coefficients Standard Error t Stat P-value Intercept LIMPORTS LRATE 9.194577 0.306131 -0.12427 0.452701 0.092492 0.028566 20.31051 3.309828 -4.35023 1.26E-37 0.001291 3.23E-05

LNCARREG=9.194577+0.306131LIMPORTS-0.12427LRATE
Save the residuals from this model and call them ECMRESID. Estimate second the short run ECM-model
SUMMARY OUTPUT
 Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.034987219 0.001224105 -0.028442505 0.897380716 105

 Coefficients Standard Error t Stat P-value Intercept 0.608732812 0.09918 6.137651 1.66E-08 LRATE4TH DIFF 0.040354728 0.157096 0.25688 0.797794 LIMPORT4TH DIFF -0.349327418 1.501216 -0.2327 0.816468 ECMRESID 0.054029855 0.471968 0.114478 0.909086

### How good is the model, and how is the model control?

The model exhibits a positive coefficient of the residuals hence not a good fit

#### Regression statistics

Regression Statistics
 Multiple R 0.290141 R Square 0.084182 Adjusted R Square 0.056979 Standard Error 10261.9 Observations 105

 Coefficients Standard Error t Stat P-value Intercept 36596.22 1974.901 18.53066 2.88E-34 D2 5444.97 2819.659 1.931074 0.056277 D3 -2616.22 2819.659 -0.92785 0.355697 D4 -1299.49 2819.659 -0.46087 0.645884

d
NCARREG= 36596.22+ 5444.97D2-2616,222D3-1299.49D4
How good is the model? Is deterministic seasonality observed? Is it true that spring is the high season for new car registrations?
Deterministic seasonality has been observed this is due to the time constant means lagged over 4 periods. Spring is a high season for new car registrations evident by the coefficient of relationships which displayed an increased value.
2c: Examine for deterministic monthly seasonality
Regression Statistics
 Multiple R 0.344883 R Square 0.118944 Adjusted R Square 0.086958 Standard Error 3669.45 Observations 315

 Coefficients Standard Error t Stat P-value Intercept 11182.37 706.1864 15.83487 1.44E-41 D2 -114.037 998.6984 -0.11419 0.909166 D3 3163.148 998.6984 3.167271 0.001696 D4 1898.36 1008.256 1.882817 0.060683 D5 3208.13 1008.256 3.181862 0.001615 D6 3387.591 1008.256 3.359854 0.00088 D7 185.9758 1008.256 0.184453 0.853781 D8 -111.255 1008.256 -0.11034 0.91221 D9 358.1681 1008.256 0.355235 0.72266 D10 723.5142 1008.256 0.71759 0.473563 D11 527.1681 1008.256 0.522852 0.60146 D12 498.9373 1008.256 0.494852 0.621063

NCARREG= 11182.37-114.037D2+3163.148D3+1898.36D4+3208.13D5+3387.591D6+185.98D7-111.255D8+358.1681D9+723.5142D10+527.1681D11+498.9373D12

### Is it true that March, May, and June are special?

Yes, the three months experience a high relationship with the dependent variable, for every unit change in the dependent variable there result in a change in the predictor variable at a high magnitude in march at a factor 3163.148, may at a factor 3208.13, and June at a factor 3387.59, being the highest.
Compare also with the model estimated on the quarterly data in question A. Is any information lost by using only the model estimated under question A?

#### Hypothesis of distractive driving

Ho: distractive driving is more likely to occur on motorways
H1: distractive driving is NOT likely to occur on motorways
SUMMARY
 Groups Count Sum Average Variance Cities 16 147 9.1875 2.9665 Land 16 109.8 6.8625 2.0665 Motorways 16 254.8 15.925 8.539333333
ANOVA
 Source of Variation SS df MS F P-value Between Groups 708.9517 2 354.4758333 78.35259228 2.19355E-15 Within Groups 203.585 45 4.524111111 Total 912.5367 47

The difference in the mean test based on the ANOVA output gives a significant p-value, that is p-value greater than the standardized value of 0.05, we, therefore, fail to reject the null hypothesis and conclude that Distractive driving is more likely to occur on motorways compared to land and cities, this can further be heightened by the high variance and mean of the accidents through motorways variable which acts as a supplementary analysis.

## ANOVA Analysis

ANOVA
 Source of Variation SS df MS F P-value F crit Sample 149.4317 3 49.81056 50.51211 5.54E-13 2.866266 Columns 708.9517 2 354.4758 359.4685 1.63E-24 3.259446 Interaction 18.65333 6 3.108889 3.152676 0.013741 2.363751 Within 35.5 36 0.986111 Total 912.5367 47

Describe the method and set up the hypotheses behind the test.
Ho: Distractive performance is equal in all the segments(U1=U2=U3=U4)
H1: Distractive performance differs per segment (U1≠U2≠U3≠U4)
Where U1 is the mean percent for distractive performance at Scandinavia and U2, U3, U4 for small Western countries, large Western countries, and East European countries respectively.

#### What is the outcome? Do we observe interaction or segmentation among the two factors?

Segmentation among the factors is observed this is evident by the p-value of 0.0137, which brings statistical insignificance.

#### What is the interpretation?

The p values in the table are used to draw conclusions, in the table statistical significance is ascertained based on this, testing for equality of distractive performance is concluded with the p-value in the column of 1.63E-24 which is less than the standardized sig of 0.05.

#### Which group(s)/segments of countries are of interest with regard to both factors?

 Segm Cities Land Motorways 1 7.1 5.3 12.4 2 8.8 6.4 15.0 3 10.0 6.7 17.0 4 10.9 9.0 19.4

The table displays averages of the incidents of distractive performance, the last two segments of countries are of interest when we regard the factors, as attributed to the large mean values of the incidences.

#### Can ranking be undertaken?

Based on the tabulated means, there is clear evidence that ranking can be undertaken.
Examine the symmetry of the dataset and conduct the Bowman-Shenton test. Set up the hypotheses, describe, and perform the test.
H0: the data follow a normal distribution.
H1: the data does not follow a normal distribution.
Bowman-Shenton statistics is defined as;

 Skewness -2.13823 Kurtosis 4.62957 Bowman-Shenton test. 21.8164

Chi-square value at 95% level of significance and d.f (n-1) =24 is 13.85 since the B.S> Chi-square value we reject the null hypothesis and conclude that the data does not follow a normal distribution, that is the data is skewed, this is further evident by the skewness value of -2.13823
Set up a sign-test and examine a hypothesis stating that the median is higher than
13,000. Set up the hypotheses, describe and perform the test.
Ho:Median> 13000
H1:Median <13000
Sign test
 SUCCESS(positive) 19 FAILURE(negative) 6 Q 0.24 P 0.76 BINOMIAL PROBABILTY 0.184054

The p-value based on the binomial probability is more than 0.05; we, therefore, reject the null hypothesis and conclude that the median is less than 13000.

#### Analysis of incomes

5A
DKINCOME
 Mean 526.638 Standard Error 17.1944 Median 528 Mode 600 Standard Deviation 343.888 Kurtosis 11.7271 Skewness 2.67417 Range 2662 Sum 210655 Count 400 Confidence Level (95.0%) 33.803

The data set has a mean of 526.6375, normality of the data is indistinct since the skewness value is high at a positive 2.6742. This further evident by the histogram below which shows the data as skewed to one side.
 Class Frequency 0-160 27 161-320 88 321-480 73 481-640 111 641-800 65 801-960 20 961-1120 4 1121-1280 1 1281-1440 1 1441-1600 1 1601-1760 1 1761-1920 1 1921-2080 1 2081-2240 3 2241-2400 1 2401-2560 1 2561-2720 1

The majority of the data are in the class of 481-640 representing 111, followed by 161-320 at 88, the rest of the distribution was as in the frequency table, highly depicting skewness to the right.
Use descriptive statistics and a discussion of the shape of the samples compared to the distribution of the total data set.
 Simple random sample Stratified/systematic sampling Total sample Mean 557.575 516.925 526.6375 Standard Error 62.92947709 32.46425 17.19441727 Median 510 540 528 Mode 600 600 600 Standard Deviation 398.0009591 205.3219 343.8883454 Sample Variance 158404.7635 42157.1 118259.1941 Kurtosis Kurtosis -0.42397 11.72712285 Skewness 2.986920769 -0.13469 2.674171798 Range 2032 840 2662 Minimum 68 120 18 Maximum 2100 960 2680 Sum 22303 20677 210655 Count 40 40 400

The data set from a simple random sample and total population are positively skewed, with a skewness coefficient of 2.986920769 and 2.674171798 respectively, this is further displayed by the histogram which weighed to the right side. Nonetheless, the sample from the stratified sampling technique exhibits a negative skewness, showing that the majority of the data points are to the left side, the three data sets explained in terms of outliers’ exhibit high outliers for the total population, and the simple random sample as compared to the stratified sampling.
Get a similar solution from our team. We offer quality quantitative analysis homework help.