• Data analysis for decision makers

## Data analysis for decision makers

Data analysis can be defined as the application of statistical methods and techniques to explore and describe data in order to obtain useful information. The process involves retrieving original data using special computer applications and systems. This data is then converted and transformed into different classifications and formats of meaningful information that is used to make informed decisions.

### Descriptive statistics

Descriptive statistics is the term given to the analysis of data that helps describe, show or summarize data in a meaningful way. Descriptive statistics do not allow us to make conclusions beyond the data we have analyzed.
Section 1.
Through our analysis at the Section 1, we know that Call Category 1 “New Business” & Call Category 3 “New Claim” are the most frequent appeared at the sampling calls.
For the “New Business”, most of the call duration is short and is distributed in the lower portion. Some extremely long call cause the long tail and distortion to the right and cause the Mean (6.432) to be greater than the Median (5.94).
For the “New Claim”, most of the call duration is long and is distributed in the upper portion. Some extremely short call cause the long tail and distortion to the left and cause the Mean (6.252) to be less than the Median (6.855).
The average claim value is €498.704 and the time to proceed claim is 9.575 weeks in average.
Section 2.
In this section, the main purpose is find out the Call Category related to claim issues are Call Category 2 “ Query on Existing Claims”& Call Category 3 “New Claim”, the total number of calls are 40.
Furthermore, 19 out of 40 calls are related to claim. The covariance of need to follow-up calls and no needs to follow-up calls are -0.394, which reflect that their relationship are negative.
For the claim process, there are 44 claims completed that authorize by AC are 18 claims, and PC are 26. And the relationship between the call centre’s call of claims and claim process are -2.889, a negative result too.
To get help with academic tasks related to this topic, take our descriptive statistics assignment help.

### Details of Data Analysis

Section 1
Part I. In according with the data “Table 1” of the insurance company taken in April 2015, please find our analysis report for the Call Categories as follow:
The Mode of the Categories 1 & 3 are equal to 3 which are the highest value among the call categories. It means that Category 1 and Category 3 are found to be the most frequently appeared at the calls and so we will focus on this 2 categories for further analysis.
(A) The data analysis of Call Category 1: “New Business”
From the Table 1A (see the Appendix I), the analytical data is summaried as follows.

 N, no. of sample data 22 Mean 6.432 Median 5.940 Sample Standard Deviation 2.285 Range 10.120 Coefficients of Variation 36%

Z-score of data are being checked and their value are within the range of -3.0 & +3.0. Therefore, no extreme data is removed.

Mean is suggested to be the average point of the given data set. In Table 1A, the Mean is 6.432. It means that the staffs spend 6.432 minutes on “New Business” call in average.

The Median of the “New business” calls is 5.940. Median < Mean, which will distort the distribution curve into right-skewed distribution. Fig. 1A Distribution curve (positive or right-skewed distribution) for the call duration of the “New Business”.

The Call Category 1 “New Business” display a right-skewed distribution. In the right-skewed distribution, most of the call duration is short and is distributed in the lower portion. Some extremely long call cause the long tail and distortion to the right and cause the Mean (6.432) to be greater than the Median (5.94). Because the skewness statistic for such a distribution will be greater than zero, some use the term positive skew to describe this distribution.

### The data analysis of Call Category

From the Table 1B (see the Appendix II), the analytical data is summaried as follows.

 N, no. of sample data 22 Mean 6.252 Median 6.855 Sample Standard Deviation 2.499 Range 9.700 Coefficients of Variation 40%

Z-score of data are being checked and their value are within the range of +/-3.0. Therefore, no extreme data is removed.

Mean is suggested to be the average point of the given data set. In Table 1B, the Mean is 6.252. It means that the staffs spend 6.252 minutes on “New Claim” call in average.

The Median of the “New business” calls is 6.855. Median > Mean, which will distort the distribution curve into left-skewed distribution. Fig. 1B Distribution curve (negative or left-skewed distribution) for the call duration of the “New Claim”.

The Call Category 3 “New Claim” display a left-skewed distribution. In the left-skewed distribution , most of the call duration is long and is distributed in the upper portion. Some extremely short call cause the long tail and distortion to the left and cause the Mean (6.252) to be less than the Median (6.855). Because the skewness statistic for such a distribution will be less than zero, some use the term negative skew to describe this distribution.

In addition, we are notice that the Sample Standard Deviation of Table 1A (2.285) is smaller than that of Table 1B (2.499). It means that the samples for the call duration of “New Business” is closer from the mean; however, the samples for the call duration of “New Claim” is further from the mean and have higher deviation within the samples.

Comparing coefficients of variation for category 1 (new business) & category 3 (new claim)

Category 1 – new business: 36%

Category 3 – new claim: 40%

It is concluded that, relative to the mean, the call duration of “New Claim” is much more variable than that of the “New Business”.

Part II. In according with the data “Table 2” of the insurance company taken in April 2015, please find our analysis report for the Claim Value & the Time to Process Claim as follow:

(B) The data analysis of Claim Value

From the Table 2A (see the Appendix III), the analytical data is summaried as follows

 N, no. of sample data 44 Mean 622.493 Median 478.170 Sample Standard Deviation 859.626 Range 5,910.670 Coefficients of Variation 138%

Z-score of data are being checked. The Claim Value of Claim ID 1011 is found out of the range of +/-3.0. Therefore, Claim ID 1011 is the extreme case and its data is removed from our statistical analysis. After removed the data. New analytical data is summarized as follows:

 N, no. of sample data 43 Mean 498.704 Median 462.200 Sample Standard Deviation 257.392 Range 957.92 Coefficients of Variation 52%

Mean is suggested to be the average point of the given data set. In Table 2A, the Mean is 498.704. It means that the claim value is €498.704 in average. The Median of the Claim Value is 462.2. Median < Mean, which will distort the distribution curve into right-skewed distribution. Fig. 2A Distribution curve (positive or right-skewed distribution) for the Claim Value.

The Claim Value display a right-skewed distribution. In the right-skewed distribution, most of the Claim Value is short and is distributed in the lower portion. Some extremely high claim value cause the long tail and distortion to the right and cause the Mean (498.704) to be greater than the Median (462.2). Because the skewness statistic for such a distribution will be greater than zero, some use the term positive skew to describe this distribution.

### Random Variables

In the following section, we will use Discrete Random Variable and its probability distribution to explain the relationship between the call of claims received by call centre, and the claim handled by the related staff.

Since the Call Category 1: “New Business” and Call Category 4: “Service Cancellation” are not implied claims. The data analysis will focus on Call Category 2: “Query on Existing Claim”& Call Category 3-: “New Claim”.

(A) Data analysis of Call Category 2: “Query on Existing Claim”& Call Category 3: “New Claim”

According to the data from Table 1 (refers to Appendix V), the call centre received 81 calls in total. Among the data, Call Category 2 has 18 and Call Category 3 has 22 . (showed as below chart). Chart of Probabilities of Call Category Id

(B) Data analysis of Call of Claims

In Call Category 2 & 3, the total call of claims are 40. And only 19 calls needs to follow up (showed as below table):

 Category Id # of calls/ category Id call needs to follow-up (Yes) call needs to follow-up (No) 2 18 5 13 3 22 14 8 Total 40 19 21

Refer to Table 2 (Appendix VI), the result of covariance is -0.394 that indicates that there is a negative relationship between the call needs to follow-up (Yes) and call needs to follow-up (No).

According to the above data, the call centre

According to the data collected from Table 3 (Appendix VII), the total completed claims are 44 that 18 claims authorized by AC and 26 authorized by PC.

Hence, the call category received from call centre (Appendix VIII, Table 5A & 5B) is negative relationship (-2.889) with claims handled by different staff.

### Section 3

3a. In this section we’ll use Claim Value as our continuous random variable and find out the 95% confidence interval for the variable to get the lower and upper limit of average claim value for population with a 95% confidence.

As the Q-Q plot for the variable shows (Appendix IX), the plot is linear and thus, the variable follows normal distribution. As mentioned in section1, the Z-score of data are being checked. The Claim Value of Claim ID 1011 is found out of the range of +/-3.0. Therefore, Claim ID 1011 is the extreme case and its data is removed from our statistical analysis.

As we need to estimate population mean where sample mean and standard deviation is known. So, we’ll use t-statistic for estimation purposes.

In excel, 95% (alpha=0.05)confidence interval for the variable can be caluculated using

[Mean –(Confidence(alpha,standard_dev,size)) , Mean + (Confidence(alpha,standard_dev,size))]

 Claim Value € Sample Size 43 Mean 498.70 Standard Deviation 257.3923 alpha 0.05
Lower limit = 498.70 – 79.214 = 419.49
Upper limit = 498.70 + 79.214 = 577.92
Hence, 95% confidence interval can be written as (419.49, 577.92)
It means, we can say with 95% onfidence that average claim value (€) for the population will lie in between €419.19 and €577.92.

### Section 4

In this section, We’ll analyst the impact of claim value on time to process a claim (in weeks) using classical liner regression model and understand it’s use for interpolation and extrapolation purposes.
Model: Yi = a + b*Xi + ei
whereYi= Time to process a claim (in weeks)
Xi = Claim value (€)
ei = Error term
Regreession results are as follows: As the above results show, R2for the model is very low as well as F ststistic is also not significant (Significant F = 0.613643). Hence, we can say that model is not statistically significant and there doesn’t exist a significant relationship between claim value and time to process a claim.