# Using STATA for Data Analysis

Econometrics is the application of quantitative statistical methods to test existing hypotheses or develop new theories in economics and predict future trends from past data. It uses economic theory, statistical inferences, and mathematics to quantify an economic phenomenon and turn theoretical economic models into important and useful tools for decision-making. Economists use econometric tools to solve theoretical questions in various economic fields, make business decisions, and inform public policy debates.

### Descriptive Statistics

An experiment was designed to see if a specific treatment changes the odds to graduate, graduating from STEM, or being retained in the educational system. A group of 123 students was randomly assigned to the treatment, respectively control group. Data was collected regarding gender, being a Pell recipient, ACT scores (Composite and SCI REAS), and GPA. Then, information was collected about being graduated with a bachelor's degree within 6 years of enrollment, being graduated with a bachelor's degree within 6 years of enrollment in a STEM field, and about retention. The last three variables are the outcomes of our analysis.
The treatment is the variable of interest. All the others are confounders.
Logistic models were run with the aim to capture the effect of the treatment.
Later, a subsequent hypothesis about the differential impact of receiving PELL grants for male and female students is derived. It also needs to be tested.
Table 1 describes the data. There are 123 cases, 59% are female, 62% graduated with a bachelor's degree, the retention rate is .80, etc.

Table 1. Descriptive statistics

Get assistance with this topic by taking our descriptive statistics assignment help.

### Logistic Regression

Table 2. Logistic models

The inspection of logit models shown in Table 2 reveals how the studied effects look like. For all three dependent variables, GPA has a positive significant effect. For instance, an increase by one unit in GPS leads to changes in the odds to graduate by 5.68 times. The effect is significant at p<.001. Similar impacts are on graduating in a STEM field and on retention.

ACT Comp has no significant impact, while the higher ACT SCI REAS leads to the expected increase in the likelihood to graduate with a bachelor's degree in STEM fields.

Gender and being a PELL grant recipient makes no significant difference on any indicator, all being controlled for.

The treatment proves to be significant with respect to graduation with a bachelor's degree within 6 years of enrollment in a STEM field. More precisely, for those exposed to the treatment the odds to graduate in a STEM field are by 2.25 times higher. It is true that the level of significance indicates the precision of the estimation is very weak at best (p<.10, therefore the confidence interval is very large). Figure 2 illustrates these relations. One may easily observe the overlapping confidence intervals, the illustration of the non-significant effect.

Nevertheless, larger samples may lead to smaller standard errors, i.e. to observing other significant effects. In other words, some effects may become significant when a larger sample is considered. However, one cannot be sure this holds true, so one needs to stick to the results one currently has.

Figure 2. Marginal effects of the treatment (for the models in Table 1)

One may also consider a reduced model, not including ACT scores. Table 3 introduces the corresponding results for the three outcome variables.

Table 3. Reduced logistic models

One may notice the persistent effect of the GPA (significant at p<.01for all three dependent variables). The treatment proves this time to be significant for all three outcomes. Better GPA scores are positively associated with larger probabilities to graduate in general (p<.05), to graduate in STEM (p<.10), and increase the retention odds by 2.62 times (p<.05). Figure 2 illustrates these results as marginal effects. One may notice that in the absence of controlling for ACT scores, the effect of the treatment appears more important.

Figure 2. Marginal effects of the treatment (for the models in Table 2)

Table 4 explores the additional hypothesis. An interaction effect between gender and being a Pell grant recipient was considered. It turns out that a significant effect can be noticed only for the case of graduation in a STEM field. In this case, being a Pell recipient increases the odds of graduation in STEM fields. Being female also has a positive effect. However, being simultaneously female and Pell recipient decreases the odds to graduate in STEM.

Table 4. Logistic models with interactions between gender and Pell grants

Further developments may include interactions effects of Treatment with GPA, Pell, ACT Sci, and Gender.

Tables 5-8 introduce these models, considering one interaction at a time. We notice no significant interaction effect with GPA (Table 5), ACT Sci scores (Table 7), and gender (Table 8). However, being a PELL recipient increases the odds to graduate from STEM (second model from Table 6), which says the PELL grants are more likely to be effective for students in the field of hard-sciences.

Table 5. Logistic models with interactions between GPA and treatment

Table 6. Logistic models with interactions between PELL and treatment

Table 7. Logistic models with interactions between ACT SCI and treatment

Table 8. Logistic models with interactions between gender and treatment

One may also look to a reduced model, including significant effects only. Table 9 reproduces Table 6, but only GPA, ACT Sci Reasoning, and the interaction Pell grant recipients # Group=intervention/control are retained into the equations as predictors. GPA and ACT Sci Reasoning prove to be significant in predicting graduation likelihood, GAP, and the interaction matter for graduating from STEM, while in the retention model there is no significant predictor.

Table 9. Logistic models with interactions between PELL and treatment: reduced models

Further, one may test the stability of the models considering only those students that have Expected Family Contribution (EFC) (106 out of 123). Table 10 reproduces the same models as in Table 9, but the population includes only students for which EFC is not null and the EFC-treatment is the considered interaction. The results are roughly similar with respect to GPA, the Retention model indicates weak significant effects (at p<.10) for GPA. The interaction term is insignificant in all models. Act Sci Reasoning brings a weak effect only in the prediction of likely to graduate.

Table 10. Logistic models with interactions EFC#treatment: reduced models, cases with EFC only