# Statistical Analysis of Loan Approval Data: Hypotheses, Methods, and Results

September 12, 2023
Olivia Martin
🇺🇸 United States
SAS
Olivia Martin, a seasoned SAS statistics expert with 5+ years of experience and a Princeton University master's degree in statistics. Specializing in assisting students with assignment completion, ensuring comprehensive understanding and mastery.
Key Topics
• Problem Description

In this comprehensive analysis, we delve into the world of financial data through the lens of statistical analysis of loan approval data, utilizing the powerful tools of SAS. We explore five key hypotheses that scrutinize factors influencing loan status, from Fico scores to interest rates, loan grades, and employment years. Our findings, presented in a clear and structured manner, reveal insightful patterns and relationships within the dataset. This study offers valuable insights for decision-makers in the finance industry, underlining the significance of statistical analysis and SAS in extracting meaningful information from complex financial datasets.

## Problem Description

In this SAS assignment, we aim to analyze a loan dataset to determine the factors that affect the status of a loan, specifically whether it will be fully paid or charged off. The dataset contains 26 variables and 39,786 observations. For the purpose of this study, we focus on six key variables:

1. Loan Status: Whether the loan is fully paid or charged off (categorical, 2 levels).
2. Fico Score:Fair Isaac Corporation score (continuous).
3. Interest Rate: Interest rate charged on the loan (continuous, %).
5. Loan Amount: The approved loan amount (continuous).
6. Employment Years: Years of employment (categorical, 11 levels).

Milestone One: Hypotheses We start by formulating five hypotheses for our analysis:

1. Hypothesis 1:Fico Score

• Null Hypothesis (H01): There is no significant difference in the average Fico score between those who fully paid and those who charged off.
• Alternative Hypothesis (H11):There is a significant difference in the average Fico score between those who fully paid and those who charged off.
Loan StatusMethodMean95% CL MeanStd Dev95% CL Std Dev
Charged Off707.6706.831.8731.30
Fully Paid720.9720.536.1135.85
DifferencePooled-13.25-14.25-12.2535.54
DifferenceSatterthwaite-13.25-14.17-12.34

Table 1:Hypothesis 1 - Fico Score

2. Hypothesis 2: Interest Rate

• Null Hypothesis (H02): There is no significant difference in the average interest rate between those who fully paid and those who charged off.
• Alternative Hypothesis (H12): There is a significant difference in the average interest rate between those who fully paid and those who charged off.
Loan StatusMethodMean95% CL MeanStd Dev95% CL Std Dev
Charged Off0.13840.13740.03660.0359
Fully Paid0.11730.11690.03650.0363
DifferencePooled0.02110.02010.02210.0365
DifferenceSatterthwaite0.02110.02010.0221

Table 2:Hypothesis 2 - Interest Rate

3. Hypothesis 3: Loan Status and Loan Grade

• Null Hypothesis (H03):There is no significant association between loan status and loan grade.
• Alternative Hypothesis (H13):There is a significant association between loan status and loan grade.
StatisticDFValueProb
Chi-Square61472.8151<.0001
Likelihood Ratio Chi-Square61475.8336<.0001
Mantel-Haenszel Chi-Square11461.2862<.0001
Phi Coefficient0.1924
Contingency Coefficient0.1889
Cramer's V0.1924

Table 3:Hypothesis 3 - Loan Status and Loan Grade

4. Hypothesis 4:Fico Score and Loan Amount

• Null Hypothesis (H04): There is no significant relationship between Fico score and loan amount.
• Alternative Hypothesis (H14): There is a significant relationship between Fico score and loan amount.
Fico RangeHighFico Range LowInt Rate
int_rate-0.70279<.001-0.70279

Table 4:Hypothesis 4 - Fico Score and Loan Amount

5. Hypothesis 5:Loan Amount Across Employment Years

• Null Hypothesis (H05): There is no significant difference in the average loan amount across employment years.
• Alternative Hypothesis (H15): There is a significant difference in the average loan amount across employment years.
SourceDFSum of SquaresMean SquareF ValuePr> F
Model11562291259105111738719.1101.51<.0001
Error397742.0028333E1250355340.338
Corrected Total397852.0590624E12

Table 5: Hypothesis 5 - Loan Amount Across Employment Years

Milestone Two: Statistical Approaches To test these hypotheses, we employ various statistical approaches:

• Hypotheses 1 and 2:We use an independent sample t-test as it is appropriate when comparing the average values of a continuous variable (Fico score and interest rate) across two independent groups (fully paid and charged off).
• Hypothesis 3:To test the association between two categorical variables (loan status and loan grade), we utilize a chi-square test.
• Hypothesis 4: To determine the relationship between two continuous variables (Fico score and interest rate), a correlation test is employed.
• Hypothesis 5:We conduct a one-way ANOVA test to evaluate the significant differences in the average loan amount across multiple employment years.

Milestone Three: Results Our analysis yields the following results:

Hypothesis 1:Fico Score

• Independent t-test shows a significant difference in Fico scores (t(39784) = -26.00, p < .001). Those who fully paid had a higher Fico score.
MethodVariancesDFt ValuePr > |t|
PooledEqual39784-26.00<.0001
SatterthwaiteUnequal8283.8-28.42<.0001

Table 6:Hypothesis 1 - Fico Score Results

Hypothesis 2: Interest Rate

• Independent t-test shows a significant difference in interest rates (t(39784) = -40.27, p < .001). Those who fully paid had a lower interest rate.
MethodVariancesDFt ValuePr > |t|
PooledEqual3978440.27<.0001
SatterthwaiteUnequal7672.140.26<.0001

Table 7:Hypothesis 2 - Interest Rate Results of the independent t-test

Hypothesis 3: Loan Status and Loan Grade

• Chi-square test results indicate a significant association between loan status and loan grade (χ^2 (6) = 1472.82, p < .001). Cramer's V shows a weak association.
StatisticDFValueProb
Chi-Square61472.8151<.0001
Likelihood Ratio Chi-Square61475.8336<.0001
Mantel-Haenszel Chi-Square11461.2862<.0001
Phi Coefficient0.1924
Contingency Coefficient0.1889
Cramer's V0.1924

Table 8:Hypothesis 3 - Loan Status and Loan Grade Results

Hypothesis 4:Fico Score and Interest Rate

• Correlation analysis reveals a strong and significant negative correlation between Fico score and interest rate (r = -0.703, p < .001).
Pearson Correlation Coefficients, N = 39786
fico_range_highfico_range_low
int_rateint_rate-0.70279<.001-0.70279<.001

Table 9:Hypothesis 4 - Fico Score and Interest Rate Results

Hypothesis 5:Loan Amount Across Employment Years

• One-way ANOVA demonstrates a significant difference in average loan amount across employment years (F(11, 39774) = 101.5, p < .001). Loan amount varies with employment years.
SourceDFSum of SquaresMean SquareF ValuePr > F
Model11562291259105111738719.1101.51<.0001
Error397742.0028333E1250355340.338
Corrected Total397852.0590624E12

Table 10:Hypothesis 5 - Loan Amount Across Employment Years Results

Summary: In summary, our analysis suggests that Fico score is positively related to loan status, while interest rate is negatively related. Additionally, a negative correlation exists between Fico score and interest rate, indicating that higher Fico scores are associated with lower interest rates. Loan status is significantly associated with loan grade, and the average loan amount differs across employment years, with more years of employment leading to higher loan amounts.

``````FILENAME REFFILE '/home/u41099423/sasuser.v94/Loan_Data.xlsx';

PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=WORK.IMPORT; RUN;

/*** H1 **/
/* Test for normality */
proc univariate data=WORK.IMPORT1 normal mu0=0;
ods select TestsForNormality;
class loan_status;
var fico_range_high;
run;

/* t test */
proc ttest data=WORK.IMPORT1 sides=2 h0=0 plots(showh0);
class loan_status;
var fico_range_high;
run;

/* Test for normality */
proc univariate data=WORK.IMPORT4 normal mu0=0;
ods select TestsForNormality;
class loan_status;
var int_rate;
run;

/* t test */
proc ttest data=WORK.IMPORT4 sides=2 h0=0 plots(showh0);
class loan_status;
var int_rate;
run;

proc freq data=WORK.IMPORT4;
tables  (loan_status) *(loan_grade) / chisq measures nopercentnorownocum
plots(only)=(freqplotmosaicplot);
run;

proc corr data=WORK.IMPORT4 pearsonnosimplenoprob plots=none;
var fico_range_highfico_range_low;
with int_rate;
run;

proc glm data=WORK.IMPORT;
class emp_length;
model loaned_amt=emp_length;
means emp_length / hovtest=levene welch plots=none;