time-fixed effects and standard errors

Nono Guedehoussou

Join Date: Jul 2017

Posts: 7
#1

time-fixed effects and standard errors

06 Jun 2018, 11:53

Hello all,

I have a dataset that consist of 181 health facilities that have been randomized in two groups: Treatment (T ) and Control. The randomization was carried out at the health-facility level. Health facilities are nested in 8 health districts (ZS). Information of these health facilities were collected quarterly from 2012 to part of 2017 for 23 quarters. I would like to run a first model that is a time fixed-effects and health districts fixed effects since I cannot do a health facility fixed-effects because my Treatment ( the variable of interest) is time-invariant. Later on, I will interact Quarter and Treatment to see if there are any time treatment effect (second model).

I am confused about how to set up the equation and how to run it in STATA with the right clustering for the first model. My idea for the equation is the following:
Y _it= bT + V_t + X_i, were T is the treatment indicator, X ( if the health facilities is located in rural or urban area) and V_t are the quarter dummies. My question here is how do I include the health districts (ZS) in this Equation. If I add U_i for ZS, it seems wrong because "i" is for health facility.

Second, to run the code in stata, I did the following:
1.xtset facility_id
2. xtreg y T i.Quarter i.ZS, fe vce (cluster ZS). This gives me an error that panels are not nested within clusters. if I cluster with facility_id, the T is omitted (rightly so).
So how do I set it up to get the correct standard errors?

I would appreciate any help to understand how to correctly specify the model and run it in Stata (earlier version)

Thank you very much in advance,
Nono

Last edited by Nono Guedehoussou; 06 Jun 2018, 11:54. Reason: Typo
Tags: fixed effects, panel data
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

06 Jun 2018, 11:57

Nono:
welcome to this forum.
You do not specify why you decided to perform a fixed effect panel data regression vs a random effect one.
If, as you state, health facilities are actually nested within health districts, I would consider -mixed-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nono Guedehoussou

Join Date: Jul 2017

Posts: 7
#3

06 Jun 2018, 12:23

Carlo:
I thought that since I was worried about omitted variables I should use fixed effects to control for these omitted variables. I am sorry I do not understand the implication of my health facilities being nested in health districts and to therefore consider -mixed-. do you mean -mixed- to be multilevel or hierarchical model? thank you
Comment
Nono Guedehoussou

Join Date: Jul 2017

Posts: 7
#4

06 Jun 2018, 12:24

Also I do not have any variable at the health district level. I just want to control for it. thanks
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#5

06 Jun 2018, 23:50

Nono:
- the -fe- machinery gets rid of observed and unobserved heterogeneity via wiping out time-invariant predictors; however, if the omitted variables are time-varying, the problem remains.
- with -mixed- I meant multilevel or hierarchical model;
- if there's evidence of nesting, -mixed- can show different intercepts at district level and possibly different slope, too;
- otherwise you can stay -xtreg- (checking via -hausman- whether -fe- specification outperforms the -re- one) and include -i.district- as a predictor.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Nono Guedehoussou

Join Date: Jul 2017

Posts: 7
#6

07 Jun 2018, 05:50

Carlo:
Thank you, I get your three points. Supposed I stay with xtreg and include i.district i.quarter, where do I cluster at? Like I said vce (cluster facility_id) will omit the variable T that I am interested in. Is it then all right to cluster at the district level and explain why cannot cluster at the facility _id? What about U_i and the question for my first question? Thank you so much for the help.
Regards,
Nono
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#7

07 Jun 2018, 07:51

Nono:
- you should cluster on -panelid- (facility_id in your case);
- if you have many districts, you can use the -fvvarlist- notation for them (say, -i.ZS) as well you can use it for healthcare facilities in the same regression model.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nono Guedehoussou

Join Date: Jul 2017

Posts: 7
#8

07 Jun 2018, 08:03

Dear Carlo:
-I have done what you suggested as in my original post. The problem is that Treatment (T) does not vary therefore when I used vce (cluster facility_id), Treatment (T) is omitted but I m interested in the estimate of Treatment.
Thanks.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#9

07 Jun 2018, 08:09

Nono:
install the user-written comman -xtoverid- (type -search xtoverid- from within Stata to spot it) and test whether -re- specification sounds better for your data.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nono Guedehoussou

Join Date: Jul 2017

Posts: 7
#10

07 Jun 2018, 14:07

Carlo:
I tried xtoverid, after running the following. xtreg y Treatment i.ZS, re cluster (facility_id). it is given an error 1b: operator invalid. I looked it up on the web and there is a response that says that xtoverid is an old command that might not work with factor variables. I so re run the command with the following command: xtreg y Treatment, re cluster (e_id1), followed by xtoverid. And I get the following: 0.568 Chi-sd(1) P-value= 0.4512. I therefore conclude that I should use the random effect model.
I need to read a little more on it to understand correctly, I did drop the i.ZS for the command to work and did not include i.Quarter (since it is basically a hausman test and recalled that time dummies should not be included but I need to double check this). I welcome any advice but thanks so much for the idea!
Nono
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17709

#11

08 Jun 2018, 01:53

Nono:
the workaround to make pretty old user-written commands to accept -fvvarlist- is to prefix the whole code with -xi:-, as you can see from the following toy-example:

Code:

use "http://www.stata-press.com/data/r15/nlswork.dta"
. xi: xtreg ln_wage age i.race
i.race            _Irace_1-3          (naturally coded; _Irace_1 omitted)

Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.1032                                         avg =        6.1
     overall = 0.0945                                         max =         15

                                                Wald chi2(3)      =    3242.34
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |    .018534    .000331    55.99   0.000     .0178852    .0191828
    _Irace_2 |  -.1209428   .0129079    -9.37   0.000    -.1462418   -.0956439
    _Irace_3 |   .0981941   .0538424     1.82   0.068    -.0073351    .2037233
       _cons |    1.15423   .0118069    97.76   0.000     1.131089    1.177371
-------------+----------------------------------------------------------------
     sigma_u |  .36581626
     sigma_e |  .30349389
         rho |  .59231394   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xtoverid

Test of overidentifying restrictions: fixed vs random effects
Cross-section time-series model: xtreg re  
Sargan-Hansen statistic  14.662  Chi-sq(1)    P-value = 0.0001

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Nono Guedehoussou

Join Date: Jul 2017

Posts: 7
#12

08 Jun 2018, 08:05

Carlo,

Thank you. I did exactly what you suggested and get a 27.175 Chi-sq(1) P-value=0.000. I am now then forced to use fe instead of re. But then that is full circle because I can never estimate my variable of interest (Treatment) since it is time invariant in the 'fe' model. Thank you for all your help. What would you do if you were me? thanks,
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#13

08 Jun 2018, 08:46

Nono:
you can consider going Mundlak (-search mundlak-).

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

time-fixed effects and standard errors

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment