Assignment Solution: Part I - Hypothesis Testing
In this part of the Hypothesis Testing assignment, we explore the fundamental concepts of hypothesis testing and statistical significance. We delve into the differentiation of Type I error and the significance level, the division of continuous distributions, and the functions pnorm() and qnorm(). Additionally, we examine the Chi2 test and its distinctions from Analysis of Variance (ANOVA), and the appropriate statistical distributions for each. These concepts lay the foundation for hypothesis testing and statistical analysis.
Type I Error and Significance Level
- Type I Error: Occurs when the null hypothesis is falsely rejected when it is true.
- Significance Level: Measures the strength of sample evidence required to make a statistically significant decision.
Division of Continuous Distributions
- We split continuous distributions, like the normal distribution, into rejection and non-rejection regions based on the significance level or the probability distribution's characteristics.
- The critical value is used to divide the probability distribution curve into these two regions: rejection and non-rejection.
pnorm() and qnorm() Functions
- pnorm() function: Provides the cumulative density of the normal distribution at a given quantile.
- qnorm() function: Provides the quantile of a normal distribution at a given cumulative density.
- In essence: pnorm is the cumulative density function, while qnorm is the quantile function of a normal distribution.
Chi2 Test vs. ANOVA
- Chi2 Test: Used to determine the significant relationship between two categorical variables. Works solely with categorical variables.
- ANOVA (Analysis of Variance): Used for at least one categorical variable and a continuous dependent variable.
Appropriate Statistical Distributions
- Chi2 Test: Requires a continuous probability distribution, with the shape depending on its degrees of freedom, denoted as "k."
- ANOVA: Utilizes the F-distribution to assess whether three or more samples come from populations with the same mean.
Assignment Solution: Part II - Correlation and Regression
In this segment of the assignment, we explore concepts related to correlation, regression, and evaluation metrics for regression models. We differentiate covariance and correlation, discuss when to use Spearman's ρ and Kendall's τ coefficients, and distinguish between correlation and regression coefficients. We also delve into the meaning and workings of OLS (Ordinary Least Squares) and the metrics used to evaluate the goodness-of-fit in a linear regression model.
Covariance and Correlation
- Covariance: Measures the extent of dependency between two random variables. A high number denotes high dependency.
- Correlation: Measures the strength and direction of the relationship between two variables; it can be positive or negative.
Spearman's ρ and Kendall's τ
- Spearman's ρ: Used when analyzing ranked variables for covariance.
- Kendall's τ: Ideal when Pearson's correlation assumptions are not met, and when dealing with non-continuous data. Kendall is non-parametric.
Correlation Coefficient vs. Regression Coefficient
- Correlation Coefficient: Measures the degree of relationship between two variables.
- Regression Coefficient: Measures how variables influence each other within a regression model.
Ordinary Least Squares (OLS)
- OLS (Ordinary Least Squares), A linear regression technique used for estimating unknown parameters in a model.
- Works by minimizing the sum of squared residuals between actual and predicted values in a model.
Metrics for Regression Model Evaluation
- Root Mean Squared Error (RMSE): Measures the accuracy of model predictions.
- Mean Square Error (MSE): Assesses how closely a regression line aligns with the data points, providing a measure of goodness-of-fit.
By understanding these concepts, you will be well-equipped to handle hypothesis testing, correlation, regression, and the evaluation of regression models.