## Problem Description

In this SAS assignment, we aim to analyze a loan dataset to determine the factors that affect the status of a loan, specifically whether it will be fully paid or charged off. The dataset contains 26 variables and 39,786 observations. For the purpose of this study, we focus on six key variables:

**Loan Status:**Whether the loan is fully paid or charged off (categorical, 2 levels).**Fico Score:**Fair Isaac Corporation score (continuous).**Interest Rate:**Interest rate charged on the loan (continuous, %).**Loan Grade:**The grade of the loan (categorical, 7 levels).**Loan Amount:**The approved loan amount (continuous).**Employment Years:**Years of employment (categorical, 11 levels).

Milestone One: Hypotheses We start by formulating five hypotheses for our analysis:

**1. Hypothesis 1: **Fico Score

**Null Hypothesis (H01):**There is no significant difference in the average Fico score between those who fully paid and those who charged off.**Alternative Hypothesis (H11):**There is a significant difference in the average Fico score between those who fully paid and those who charged off.

Loan Status | Method | Mean | 95% CL Mean | Std Dev | 95% CL Std Dev |
---|---|---|---|---|---|

Charged Off | 707.6 | 706.8 | 31.87 | 31.30 | |

Fully Paid | 720.9 | 720.5 | 36.11 | 35.85 | |

Difference | Pooled | -13.25 | -14.25 | -12.25 | 35.54 |

Difference | Satterthwaite | -13.25 | -14.17 | -12.34 |

**Table 1: **Hypothesis 1 - Fico Score

**2. Hypothesis 2:** Interest Rate

**Null Hypothesis (H02):**There is no significant difference in the average interest rate between those who fully paid and those who charged off.**Alternative Hypothesis (H12):**There is a significant difference in the average interest rate between those who fully paid and those who charged off.

Loan Status | Method | Mean | 95% CL Mean | Std Dev | 95% CL Std Dev |
---|---|---|---|---|---|

Charged Off | 0.1384 | 0.1374 | 0.0366 | 0.0359 | |

Fully Paid | 0.1173 | 0.1169 | 0.0365 | 0.0363 | |

Difference | Pooled | 0.0211 | 0.0201 | 0.0221 | 0.0365 |

Difference | Satterthwaite | 0.0211 | 0.0201 | 0.0221 |

**Table 2: **Hypothesis 2 - Interest Rate

**3. Hypothesis 3:** Loan Status and Loan Grade

**Null Hypothesis (H03):**There is no significant association between loan status and loan grade.**Alternative Hypothesis (H13):**There is a significant association between loan status and loan grade.

Statistic | DF | Value | Prob |
---|---|---|---|

Chi-Square | 6 | 1472.8151 | <.0001 |

Likelihood Ratio Chi-Square | 6 | 1475.8336 | <.0001 |

Mantel-Haenszel Chi-Square | 1 | 1461.2862 | <.0001 |

Phi Coefficient | 0.1924 | ||

Contingency Coefficient | 0.1889 | ||

Cramer's V | 0.1924 |

**Table 3: **Hypothesis 3 - Loan Status and Loan Grade

**4. Hypothesis 4: **Fico Score and Loan Amount

**Null Hypothesis (H04):**There is no significant relationship between Fico score and loan amount.**Alternative Hypothesis (H14):**There is a significant relationship between Fico score and loan amount.

Fico Range | High | Fico Range Low | Int Rate |
---|---|---|---|

int_rate | -0.70279 | <.001 | -0.70279 |

**Table 4: **Hypothesis 4 - Fico Score and Loan Amount

**5. Hypothesis 5: **Loan Amount Across Employment Years

**Null Hypothesis (H05):**There is no significant difference in the average loan amount across employment years.**Alternative Hypothesis (H15):**There is a significant difference in the average loan amount across employment years.

Source | DF | Sum of Squares | Mean Square | F Value | Pr> F |
---|---|---|---|---|---|

Model | 11 | 56229125910 | 5111738719.1 | 101.51 | <.0001 |

Error | 39774 | 2.0028333E12 | 50355340.338 | ||

Corrected Total | 39785 | 2.0590624E12 |

**Table 5:** Hypothesis 5 - Loan Amount Across Employment Years

**Milestone Two:** Statistical Approaches To test these hypotheses, we employ various statistical approaches:

**Hypotheses 1 and 2:**We use an independent sample t-test as it is appropriate when comparing the average values of a continuous variable (Fico score and interest rate) across two independent groups (fully paid and charged off).**Hypothesis 3:**To test the association between two categorical variables (loan status and loan grade), we utilize a chi-square test.**Hypothesis 4:**To determine the relationship between two continuous variables (Fico score and interest rate), a correlation test is employed.**Hypothesis 5:**We conduct a one-way ANOVA test to evaluate the significant differences in the average loan amount across multiple employment years.

**Milestone Three: Results Our analysis yields the following results:
**

**Hypothesis 1: **Fico Score

- Independent t-test shows a significant difference in Fico scores (t(39784) = -26.00, p < .001). Those who fully paid had a higher Fico score.

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 39784 | -26.00 | <.0001 |

Satterthwaite | Unequal | 8283.8 | -28.42 | <.0001 |

**Table 6: **Hypothesis 1 - Fico Score Results

**Hypothesis 2:** Interest Rate

- Independent t-test shows a significant difference in interest rates (t(39784) = -40.27, p < .001). Those who fully paid had a lower interest rate.

Method | Variances | DF | t Value | Pr > |t| |
---|---|---|---|---|

Pooled | Equal | 39784 | 40.27 | <.0001 |

Satterthwaite | Unequal | 7672.1 | 40.26 | <.0001 |

**Table 7: **Hypothesis 2 - Interest Rate Results of the independent t-test

**Hypothesis 3:** Loan Status and Loan Grade

- Chi-square test results indicate a significant association between loan status and loan grade (χ^2 (6) = 1472.82, p < .001). Cramer's V shows a weak association.

Statistic | DF | Value | Prob |
---|---|---|---|

Chi-Square | 6 | 1472.8151 | <.0001 |

Likelihood Ratio Chi-Square | 6 | 1475.8336 | <.0001 |

Mantel-Haenszel Chi-Square | 1 | 1461.2862 | <.0001 |

Phi Coefficient | 0.1924 | ||

Contingency Coefficient | 0.1889 | ||

Cramer's V | 0.1924 |

**Table 8: **Hypothesis 3 - Loan Status and Loan Grade Results

**Hypothesis 4: **Fico Score and Interest Rate

- Correlation analysis reveals a strong and significant negative correlation between Fico score and interest rate (r = -0.703, p < .001).

Pearson Correlation Coefficients, N = 39786 | ||
---|---|---|

fico_range_high | fico_range_low | |

int_rate int_rate |
-0.70279 <.001 |
-0.70279 <.001 |

**Table 9: **Hypothesis 4 - Fico Score and Interest Rate Results

**Hypothesis 5: **Loan Amount Across Employment Years

- One-way ANOVA demonstrates a significant difference in average loan amount across employment years (F(11, 39774) = 101.5, p < .001). Loan amount varies with employment years.

Source | DF | Sum of Squares | Mean Square | F Value | Pr > F |
---|---|---|---|---|---|

Model | 11 | 56229125910 | 5111738719.1 | 101.51 | <.0001 |

Error | 39774 | 2.0028333E12 | 50355340.338 | ||

Corrected Total | 39785 | 2.0590624E12 |

**Table 10: **Hypothesis 5 - Loan Amount Across Employment Years Results

**Summary:** In summary, our analysis suggests that Fico score is positively related to loan status, while interest rate is negatively related. Additionally, a negative correlation exists between Fico score and interest rate, indicating that higher Fico scores are associated with lower interest rates. Loan status is significantly associated with loan grade, and the average loan amount differs across employment years, with more years of employment leading to higher loan amounts.

```
FILENAME REFFILE '/home/u41099423/sasuser.v94/Loan_Data.xlsx';
PROC IMPORT DATAFILE=REFFILE
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.IMPORT; RUN;
/*** H1 **/
/* Test for normality */
proc univariate data=WORK.IMPORT1 normal mu0=0;
ods select TestsForNormality;
class loan_status;
var fico_range_high;
run;
/* t test */
proc ttest data=WORK.IMPORT1 sides=2 h0=0 plots(showh0);
class loan_status;
var fico_range_high;
run;
/* Test for normality */
proc univariate data=WORK.IMPORT4 normal mu0=0;
ods select TestsForNormality;
class loan_status;
var int_rate;
run;
/* t test */
proc ttest data=WORK.IMPORT4 sides=2 h0=0 plots(showh0);
class loan_status;
var int_rate;
run;
proc freq data=WORK.IMPORT4;
tables (loan_status) *(loan_grade) / chisq measures nopercentnorownocum
plots(only)=(freqplotmosaicplot);
run;
proc corr data=WORK.IMPORT4 pearsonnosimplenoprob plots=none;
var fico_range_highfico_range_low;
with int_rate;
run;
proc glm data=WORK.IMPORT;
class emp_length;
model loaned_amt=emp_length;
means emp_length / hovtest=levene welch plots=none;
lsmeansemp_length / adjust=tukeypdiff alpha=.05;
run;
quit;
```