Quantitative analysis

Quantitative analysis is the process of collecting measurable data and evaluating the data. The data can be market shares, revenues, and wages. It helps people understand the performance of their businesses. In the early days, people relied on experience during decision making. However, in the current days, people use quantitative analysis to know performance and make decisions.

Matrix of correlation

 

NCARREGIMPORTSRATE
NCARREG1

IMPORTS0.6536061
RATE-0.69246-0.901121

There is a high correlation between imports and registered cars, with an increase in the number of registered cars there equates to an increase in import value as the time period heighten from 1992q1 2018 q1.On the contrary, rates display a negative correlation with both new registered cars and imports. Which entirely means for every unit change in either Imports or Registered cars there results in a change in rates in the opposite direction, which has been a trend over the time period on the study?
This assertion can further be evident by the time series graphs below,
time series graphs
Learn more through our quantitative analysis assignment help.
Estimate a regression model of the following form for the period 1992Q1 to 2018Q1:
Regression Statistics
Multiple R0.69714
R Square0.486004
Adjusted R Square0.470736
Standard Error7687.808
Observations105


CoefficientsStandard Errort StatP-value
Intercept39302.549449.2914.1593116.71E-05
IMPORTS14.214585.542350.1661690.868356
RATE-1889.571068.645-1.76820.080048
TREND78.22917130.33070.6002360.549693

NCARREG= 39302.54+ 14.2145IMPORT-1889.57RATE+78.22917TREND 
New registrations of cars change by a unit resulted in a change directly in imports by a factor 14.2145 and trend by 78.23, but inversely with rate by a factor 1889.57.
To improve the model dynamics are introduced. The two variables to explain the car registrations are lagged by 4 periods equivalent to a year. The model to be estimated for the period 1993Q1 to 2018Q1 is now of the form:
Regression Statistics
Multiple R0.679677
R Square0.461961
Adjusted R Square0.445321
Standard Error7695.466
Observations101


CoefficientsStandard Errort StatP-value
Intercept49344.5610777.074.5786621.39E-05
TREND41.77199146.86040.2844330.776685
IMPORTS LAG-25.467392.8713-0.274220.784497
RATELAG-2799.191243.621-2.250840.026653

NCARREG= 49344.56-25.4673IMPORT-2799.19RATE+41.7719TREND
Lagging the period’s results in a relationship where New registrations of cars display an indirect relationship with both import and rate, but a direct relationship with the trend,.
Conduct a model control and investigate for the presence of autocorrelation by use of the Durbin Watson test. Save the residuals from this regression and call this variable RESID. Comment on the sign of the included variables. Assume a 10 percent level of significance. Compare also the model found under question B
SUM OF SQUARED RESIDUALS5969341438
SUM OF SQUARED DIFF RESIDUALS4700260869
DURBIN WATSON0.78740024

The Durbin Watson statistic displays a positive presence of autocorrelation, the similarity between observations as a function of the time lag between them is thus high.
The model can explain variability of up to 67% within the data set compared to the model in B which explained up to 69% of the variability within the data set.
Expand the initial model above to an autoregressive (AR) model of order 4. This is an AR-4 model for the period 1994Q1 to 2018Q1. Estimate a model of the form
SUMMARY OUTPUT
Regression Statistics     
Multiple R0.838415552
R Square0.702940638
Adjusted R Square0.67903931
Standard Error5443.022059
Observations95


CoefficientsStandard Errort StatP-value
Intercept47268.576997963.3015.9358025.83E-08
IMPORTS LAG-58.7674054767.89612-0.865550.389118
RATELAG-2263.925707925.0793-2.447280.016404
t-10.3686810170.0960273.8393310.000234
t-20.0252107390.106950.2357240.814201
t-3-0.0285882720.106814-0.267650.789605
t-40.4759361190.0976764.872624.9E-06
Trend113.0254443107.34851.0528830.29531

NCARREG= 47268.57-58.767 IMPORT-2263.92RATE+113.025TREND+0.368RESID(t-1)+0.0252RESID(t-2)-0.0285RESID(t-3)+0.4759RESID(t-4)

Why is it interesting for this data set to consider a model of order 4 relative to quarterly statistics?

This is because an explanation of high variance in the data can be achieved at 83.84%.
SUM OF SQUARED DIFFERENCES3535804788
SUM OF SQUARED RESIDUALS2577504555
DURBIN WATSON STAT0.72897253

The Durbin Watson statistic displays a positive presence of autocorrelation, the similarity between observations as a function of the time lag between them is thus high.
RESIDUAL PLOT
Imports lag Residual plot
1E

CoefficientsStandard Errort StatP-value
Intercept
LIMPORTS
LRATE
9.194577
0.306131
-0.12427
0.452701
0.092492
0.028566
20.31051
3.309828
-4.35023
1.26E-37
0.001291
3.23E-05

LNCARREG=9.194577+0.306131LIMPORTS-0.12427LRATE
Save the residuals from this model and call them ECMRESID. Estimate second the short run ECM-model
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.034987219
0.001224105
-0.028442505
0.897380716
105


CoefficientsStandard Errort StatP-value
Intercept0.6087328120.099186.1376511.66E-08
LRATE4TH DIFF0.0403547280.1570960.256880.797794
LIMPORT4TH DIFF-0.3493274181.501216-0.23270.816468
ECMRESID0.0540298550.4719680.1144780.909086

How good is the model, and how is the model control?

The model exhibits a positive coefficient of the residuals hence not a good fit

Regression statistics

Regression Statistics
Multiple R0.290141
R Square0.084182
Adjusted R Square0.056979
Standard Error10261.89
Observations105


CoefficientsStandard Errort StatP-value
Intercept36596.221974.90118.530662.88E-34
D25444.972819.6591.9310740.056277
D3-2616.222819.659-0.927850.355697
D4-1299.492819.659-0.460870.645884

d
NCARREG= 36596.22+ 5444.97D2-2616,222D3-1299.49D4
How good is the model? Is deterministic seasonality observed? Is it true that spring is the high season for new car registrations?
Deterministic seasonality has been observed this is due to the time constant means lagged over 4 periods. Spring is a high season for new car registrations evident by the coefficient of relationships which displayed an increased value.
2c: Examine for deterministic monthly seasonality
Regression Statistics
Multiple R0.344883
R Square0.118944
Adjusted R Square0.086958
Standard Error3669.452
Observations315


CoefficientsStandard Errort StatP-value
Intercept11182.37706.186415.834871.44E-41
D2-114.037998.6984-0.114190.909166
D33163.148998.69843.1672710.001696
D41898.361008.2561.8828170.060683
D53208.131008.2563.1818620.001615
D63387.5911008.2563.3598540.00088
D7185.97581008.2560.1844530.853781
D8-111.2551008.256-0.110340.91221
D9358.16811008.2560.3552350.72266
D10723.51421008.2560.717590.473563
D11527.16811008.2560.5228520.60146
D12498.93731008.2560.4948520.621063

NCARREG= 11182.37-114.037D2+3163.148D3+1898.36D4+3208.13D5+3387.591D6+185.98D7-111.255D8+358.1681D9+723.5142D10+527.1681D11+498.9373D12

Is it true that March, May, and June are special?

Yes, the three months experience a high relationship with the dependent variable, for every unit change in the dependent variable there result in a change in the predictor variable at a high magnitude in march at a factor 3163.148, may at a factor 3208.13, and June at a factor 3387.59, being the highest.
Compare also with the model estimated on the quarterly data in question A. Is any information lost by using only the model estimated under question A?

Hypothesis of distractive driving

Ho: distractive driving is more likely to occur on motorways
H1: distractive driving is NOT likely to occur on motorways
SUMMARY
GroupsCountSumAverageVariance
Cities161479.18752.9665
Land16109.86.86252.0665
Motorways16254.815.9258.539333333
ANOVA
Source of VariationSSdfMSFP-value
Between Groups708.95172354.475833378.352592282.19355E-15
Within Groups203.585454.524111111

Total912.536747



The difference in the mean test based on the ANOVA output gives a significant p-value, that is p-value greater than the standardized value of 0.05, we, therefore, fail to reject the null hypothesis and conclude that Distractive driving is more likely to occur on motorways compared to land and cities, this can further be heightened by the high variance and mean of the accidents through motorways variable which acts as a supplementary analysis.

ANOVA Analysis

ANOVA
Source of VariationSSdfMSFP-valueF crit
Sample149.4317349.8105650.512115.54E-132.866266
Columns708.95172354.4758359.46851.63E-243.259446
Interaction18.6533363.1088893.1526760.0137412.363751
Within35.5360.986111


Total912.536747




Describe the method and set up the hypotheses behind the test.
Ho: Distractive performance is equal in all the segments(U1=U2=U3=U4)
H1: Distractive performance differs per segment (U1≠U2≠U3≠U4)
Where U1 is the mean percent for distractive performance at Scandinavia and U2, U3, U4 for small Western countries, large Western countries, and East European countries respectively.

 What is the outcome? Do we observe interaction or segmentation among the two factors?

Segmentation among the factors is observed this is evident by the p-value of 0.0137, which brings statistical insignificance.

What is the interpretation?

The p values in the table are used to draw conclusions, in the table statistical significance is ascertained based on this, testing for equality of distractive performance is concluded with the p-value in the column of 1.63E-24 which is less than the standardized sig of 0.05.

Which group(s)/segments of countries are of interest with regard to both factors?

SegmCitiesLand Motorways
17.15.312.4
28.86.415.0
310.06.717.0
410.99.019.4

The table displays averages of the incidents of distractive performance, the last two segments of countries are of interest when we regard the factors, as attributed to the large mean values of the incidences.

Can ranking be undertaken?

Based on the tabulated means, there is clear evidence that ranking can be undertaken.
Examine the symmetry of the dataset and conduct the Bowman-Shenton test. Set up the hypotheses, describe, and perform the test.
H0: the data follow a normal distribution.
H1: the data does not follow a normal distribution.
Bowman-Shenton statistics is defined as;
Bowman-Shenton statistics    
Skewness-2.13823
Kurtosis4.629575
Bowman-Shenton test.21.81636

Chi-square value at 95% level of significance and d.f (n-1) =24 is 13.85 since the B.S> Chi-square value we reject the null hypothesis and conclude that the data does not follow a normal distribution, that is the data is skewed, this is further evident by the skewness value of -2.13823
Set up a sign-test and examine a hypothesis stating that the median is higher than
13,000. Set up the hypotheses, describe and perform the test.
Ho:Median> 13000
H1:Median <13000
Sign test
SUCCESS(positive)19
FAILURE(negative)6
Q0.24
P0.76
BINOMIAL PROBABILTY0.184053578

The p-value based on the binomial probability is more than 0.05; we, therefore, reject the null hypothesis and conclude that the median is less than 13000.

Analysis of incomes

5A
DKINCOME
Mean526.6375
Standard Error17.19441727
Median528
Mode600
Standard Deviation343.8883454
Kurtosis11.72712285
Skewness2.674171798
Range2662
Sum210655
Count400
Confidence Level (95.0%)33.80297294

The data set has a mean of 526.6375, normality of the data is indistinct since the skewness value is high at a positive 2.6742. This further evident by the histogram below which shows the data as skewed to one side.
Histogram
ClassFrequency
0-16027
161-32088
321-48073
481-640111
641-80065
801-96020
961-11204
1121-12801
1281-14401
1441-16001
1601-17601
1761-19201
1921-20801
2081-22403
2241-24001
2401-25601
2561-27201

The majority of the data are in the class of 481-640 representing 111, followed by 161-320 at 88, the rest of the distribution was as in the frequency table, highly depicting skewness to the right.
Use descriptive statistics and a discussion of the shape of the samples compared to the distribution of the total data set.

Simple random sampleStratified/systematic samplingTotal sample
Mean557.575516.925526.6375
Standard Error62.9294770932.4642517.19441727
Median510540528
Mode600600600
Standard Deviation398.0009591205.3219343.8883454
Sample Variance158404.763542157.1118259.1941
KurtosisKurtosis-0.4239711.72712285
Skewness2.986920769-0.134692.674171798
Range20328402662
Minimum6812018
Maximum21009602680
Sum2230320677210655
Count4040400

The data set from a simple random sample and total population are positively skewed, with a skewness coefficient of 2.986920769 and 2.674171798 respectively, this is further displayed by the histogram which weighed to the right side. Nonetheless, the sample from the stratified sampling technique exhibits a negative skewness, showing that the majority of the data points are to the left side, the three data sets explained in terms of outliers’ exhibit high outliers for the total population, and the simple random sample as compared to the stratified sampling.
Get a similar solution from our team. We offer quality quantitative analysis homework help.