New Variable - Statalist

Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#16

30 Mar 2022, 10:25

Yes, if country is already a numeric variable, that's the correct code. When you wrote

but do have the names for all the countries within the dataset

I took that to mean that it was a string variable.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#17

31 Mar 2022, 04:25

So overall, my codes would be:
regress mental_health i.n_country##i.wave##i.age,cluster(Country)

Within the regression code I couls also include other variables, like gender, education level, etc. So I could also do then:

regress mental_health i.n_country##i.wave##i.age gender education_level,cluster(Country)

margins n_country#wave

margins n_country, dydx(wave)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#18

31 Mar 2022, 10:34

Yes, but pay attention to the details on the other variables. Whether you specify gender or i.gender will not affect the quantitative results, but you will get better, more convenient labeling of the output if you use i.gender, as whatever value label is attached to your gender variable will be drawn upon.

For education level this is more important. Unless you wish to treat education level as a continuous variable, it is important to use the i. prefix so that you get separate coefficients for each level.

I thought we established in #15 that Country is, itself, already a numeric variable, so there is no need to create n_country. Just use Country itself in the commands.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#19

31 Mar 2022, 14:42

Understood. Thank you. I have run this regression but for some reason the F statistic does not appear. It just appears with a black dot. I was wondering how to solve for this
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#20

31 Mar 2022, 15:06

You will need to show the complete regression output you got from Stata (the output directly from the -regress- command, not any secondary output from subsequent processing by programs like -esttab- or -estout-) to get advice on this.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#21

03 Apr 2022, 09:03

This is the output from the regression:

Here I am testing the mental well-being for an individual if they are coloured or not in the set of countries after a migration policy imposed
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#22

03 Apr 2022, 11:28

When you use cluster robust standard errors, the denominator degrees of freedom is not based on the sample size but is, instead, the number of clusters minus 1. In your case, there are 10 clusters (countries), so you have only 9 df. That means that no hypothesis involving more than 8 linear contrasts is possible. The overall model F statistic is a test of the joint hypothesis that all of the coefficients (except the constant) are zero. You have 45 (if I have not miscounted) of those, so you are way beyond that here.

Now, this is usually not a problem. Was it among your research goals to test that joint hypothesis? Probably not. If nothing else, I imagine that the demographics, at least, were included only to deal with possible omitted variable bias and you have no interest in their effects. In modern research this is nearly always the case: many variables are included to adjust for their effects, not to test them. So the overall model F-test is really just a historical relic of the days when models often consistent just of the key test variables with few or no covariates adjusted for. So you can ignore that.

But I'll tell you what is a problem. With only 10 countries, you should not be using cluster robust standard errors. Clustered standard errors require a larger number of clusters in order to work. When the number is small, as here, they are often actually worse than the unclustered ones.
1 like
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#23

03 Apr 2022, 16:50

right I understand that. I should not be clustering countries then without clustering the coefficients would not really change. The interpretation of the coefficients remains the same where the interaction term between the country#wave#ethnic is of interest. It would should the mental well-being difference of ethnic people after the migration policy. It would be the difference between an ethnic and non-ethnic individual. I ran the same regression but without clustering and got the following:

Would this lead to similar interpretations? I also see that the F statistic has appeared now as well

Attached Files
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#24

03 Apr 2022, 17:06

Yes, the F-statistic reappears because without clustered standard errors your degrees of freedom for test statistics has soared to 28,120, so there is plenty of room for the joint null hypothesis test.

I don't know what you mean when you refer to "the interaction term between the country#wave#ethnic [emphasis added]" as you have 9 of them. The interpretation of these coefficients is complicated. If what you are interested in is the difference in mental health between ethnic = 1 and ethnic = 0 observations in each country during wave 2, you would be best off getting that from -margins-:

Code:

margins Country, dydx(ethnic) at(wave = 2)
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#25

04 Apr 2022, 14:46

Sorry, that is what I meant. There are 9 interaction terms for country##wave##ethnic and got the following use the code mentioned previously:

For example, if I wanted to interpret the coefficient for Great Britain, it would be the mental well-being of ethnic individuals is 0.217 greater than the mental wellbeing of a non-ethnic individual
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#26

04 Apr 2022, 15:11

For example, if I wanted to interpret the coefficient for Great Britain, it would be the mental well-being of ethnic individuals is 0.217 greater than the mental wellbeing of a non-ethnic individual

I'm not sure where you're getting that 0.217 from. The value in the Great Britain row of the dy/dx column is 0.2196188. If you round that to 3 decimal places you get 0.220. And you could report that as the expected difference in mental wellbeing score of an ethnic person and a non-ethnic person in Great Britain. Also, if it were me, I would only report these results to 2 decimal places. Either way, you should, of course, also report the uncertainty in this estimate using either the standard error or the confidence interval.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#27

04 Apr 2022, 15:17

Sorry, again that is what I meant. I was wondering what the difference would be if I then run this regression for a non-ethnic individual. So instead of having:
regress Mental_Health i.Country##i.Wave##i.non_ethnic Gender Age Marital People_IN_House Educ Employ Inc

Would this be wrong in doing this?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#28

04 Apr 2022, 15:24

Assuming that this non_ethnic variable is just 1 whenever ethnic = 0 and 0 whenever ethnic = 1, you will get the same results you got before, except that the results for ethnic and any interactions including it will have their signs reversed. The marginal effects of non_ethnic will just be the negative of the values you got for the marginal effects of ethnic. If you want to do it for fun, go ahead. But it adds no value to the analysis.
Comment
Taiba Chau

Join Date: Feb 2022

Posts: 105
#29

04 Apr 2022, 15:26

I have just done it and got exactly what you highlighted which is really interesting. I was wondering with this setting if it possible to run a triple DiD model or not. Something I have across while reading but I am slightly confused on the setting
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29809
#30

04 Apr 2022, 15:32

I don't understand your question. What you are running is a triple DiD model, whether you use i.ethnic or i.non_ethnic.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment