# Using STATA for Data Analysis

### Regression Analysis

### Table 1: Summary Statistics

Variable |
Label (coding) |
Mean |
Standard Deviation |

female | 1 if female; 0 if male | 0.44 | 0.4968841 |

Never_married | 1 if never married; 0 otherwise | 0.486 | 0.5003045 |

a) numchildren and gender

H0: the average number of children a respondent have is the same irrespective of their gender

Ha: the average number of children a respondent have is different for different gender

Here, the average number of children a respondent have is expected to be the same for both male and female unless there is no selection bias because every child would have parents on one male and female each.

b) numchildren and marstat

H0: the average number of children a respondent have is the same irrespective of the marital status

Ha: the average number of children a respondent have is different for respondents with different marital status

Here, the null hypothesis is expected to fail as people with different marital statuses may have a different number of children (especially the never-married category who might have less number of children on average)

c) numchildren and birthyear

H0: the average number of children a respondent have is the same irrespective of their birthyear

Ha: the average number of children a respondent have is different for respondents with different birthyear

Here also, the null hypothesis is expected to fail as people with later birthyear (meaning younger) might have a lesser number of children compared with respondents with earlier birthyear (older)

d) numchildren and faminc_new

H0: the average number of children a respondent have is the same irrespective of their family income level

Ha: the average number of children a respondent have is different for respondents with different family income level

Here also, the null hypothesis is expected to fail as people with higher family income might be more willing to have more children as they might be able to afford it.

Table 2: OLS Regression Results

For more information on this topic, take our regression analysis assignment help.

Variable |
Regression coefficient |
P > |t| |
Statistically significant? |
Interpretation |

gender | 0.291274 | 0.061 | No | Male respondents on average had 0.29 children more than female respondents. But it is not statistically significant and hence ignored |

never_married | -1.509981 | 0.000 | Yes | Respondents who were never married had on average 1.51 children less than other respondents |

birthyr | -0.017774 | 0.000 | Yes | The number of children that the respondents had decreased by 0.02 for every unit increase in their birth year. This is expected as older people tend to have more children |

faminc_new | -0.0488778 | 0.041 | Yes | People tend to have lesser children on average as their income level rises. But the difference is minimal and also just marginally significant statistically |

#### Data Analytics

From the above two plots (Residual plot, and residual histogram), it can be seen that the residuals are normally distributed but the dispersion is not constant for all the values of independent variables. Therefore, the homoskedasticity assumption looks to be violated in this model. This may be because all the variables are discrete and not continuous. The heteroskedasticity is evident from the heteroskedasticity test as well (p<0.05). The variation might be due to some variables omitted in the regression, which might have an impact on the dependent variable.

The variable “children” can be generated simply by making the entries as 0 when numchildren=0 and 1 when numchildren!= 0.

Table 3: Logistic Regression Results

Variable |
Regression coefficient |
P > |t| |
Statistically significant? |
Interpretation |

female | 0.678909 | 0.102 | No | Female respondents on average had only 0.679 times male’sodds of having a child. But it’s not statistically significant and hence ignored |

never_married | 0.1781378 | 0.000 | Yes | Respondents who were never married had on average only 0.178 times the odds of amarried respondent’s chance of having a child. |

birthyr | 0.9810059 | 0.013 | Yes | The odds that a respondent born a year later has a child is 0.98 times a respondent born a year earlier |

faminc_new | 0.9214551 | 0.027 | Yes | The odds that a respondent in one level higher has a child is 0.98 times a respondent in one level lower income. |