Issue with xthdidregress command on STATA 18

Thibaud Marcesse

Join Date: Aug 2020

Posts: 13
#1

Issue with xthdidregress command on STATA 18

14 May 2023, 11:26

Hi,

I am trying to use the xthdidregress with panel data (STATA 18). I am looking at the implementation of a poverty alleviation program in India, using elections as a `treatment' (to determine whether constituencies held by the incumbent see more spending). I have panel data for 10 fiscal years, which I reshaped as long before running xtset and xthdidregress. There are two elections in the dataset, so basically two `treatments' (so two time periods, and two cohorts as a result). I have created a treatment variable for each of the fiscal years, coded as 1 when the incumbent at the state level holds the constituency and 0 otherwise. I have also reshaped these treatment variables, which is why I only type 'treatedac' rather than the full form for each fiscal year ('treatedac12' 'treatedac13' etc.)

Whenever I run xthdidgress, I get the following error, and I am not sure where this is coming from. I highly suspect this has to do with the way I coded the treatment variable (or the way I am using it), but not sure why... STATA 18 should be able to handle heterogenous treatment under this specification right? When I look at the data editor, I see that STATA recognized the second treatment (the election of 2017), but not the first (the elections of 2012).

Here's the code:

. xthdidregress ra (logtotpdac prop_poor) (treatedac), group(ac)
note: variable _did_cohort, containing cohort indicators formed by treatment variable treatedac and
group variable ac, was added to the dataset.
invalid treatment
The treatment is assumed to be staggered. Once a unit is treated, it should remain treated.
r(498);

Thanks for letting me know!
Tags: None
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 209
#2

14 May 2023, 12:31

Dear Thibaud,

-xthdidregress- looks at the treatment group, formed using -treatedac- and -ac, to verify that you have a valid treatment. The pattern of the treatment within -ac- should be a group of zeros followed by ones or a pattern of only zeros. If at some there is a pattern of zeros, followed by ones, and then zeros, you will trigger this error message. In other words, whenever a unit is treated it remains treated and cannot revert to being a control. You can look at your -ac-, -treatedac-, your time variable, and _did_cohort, to see what is happening in your data.
Comment
Thibaud Marcesse

Join Date: Aug 2020

Posts: 13
#3

14 May 2023, 12:54

Thanks a lot Enrique! So this basically means that I need to code the treatment variable as 1 for an observation 'treated' in 2012 (i.e. in a constituency won by the incumbent) until the end of the entire time period, even if that observation was no longer 'treated' after 2017' (won by the opposition party). Is that correct?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2078
#4

14 May 2023, 13:36

I’m traveling, but I can show how to allow for “exit,” but it can’t be done using xthdidregress — at least not yet. 😉
2 likes
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 209
#5

14 May 2023, 13:53

Dear Thibaud,

If the units of your analysis switch from treated to control, the estimators implemented might not be adequate for your analysis. I say MIGHT because you could code units as being treated once they have been treated, even if they switch to become controls later. The implication here is that units that are exposed to treatment are somewhat forever after affected by this initial exposure. But you would need to justify this choice. In other words, it is a bit more than a coding decision, if I understand the problem correctly.

Clément de Chaisemartin and Xavier D'Haultfoeuille deal with this issue in their 2020 AER paper and have Stata code for it (did_multiplegt). They also have really great accompanying material. I suggest you gather the information at Clément's website: https://sites.google.com/site/clementdechaisemartin/

Jeff, I am looking forward to hear what you have to say about this issue.

Last edited by Enrique Pinzon (StataCorp); 14 May 2023, 13:58.
Comment
Thibaud Marcesse

Join Date: Aug 2020

Posts: 13
#6

14 May 2023, 14:05

Thanks, Enrique. I think the point made in re units being forever treated makes a lot of sense, but even after changing the code (that is, coding as 1 constituencies that were treated in 12 but not in 17), I get the same error message. STATA does not consider the treatment of 2012 for some reason (the did_cohort is either '17' or 'Never treated'), which makes zero sense to me. Why would STATA not want to consider that initial treatment?

I'll be sure to check the materials recommended above in any case.
Comment
Enrique Pinzon (StataCorp)

StataCorp Employee

Join Date: Jan 2015

Posts: 209
#7

14 May 2023, 14:10

Hi Thibaud,

Could you please send your data and code, if possible, to [email protected] and we will take a closer look.
Comment
Sarah Gust

Join Date: Jul 2023

Posts: 2
#8

24 Jul 2023, 05:33

Originally posted by Jeff Wooldridge View Post

I’m traveling, but I can show how to allow for “exit,” but it can’t be done using xthdidregress — at least not yet. 😉

Are there any news? Jeff Wooldridge I am curious to see your approach to this.
Comment

FernandoRios

Join Date: Apr 2014
Posts: 2393

24 Jul 2023, 08:34

Hi Sarah,
I think Prof Wooldridge's approach extends the extended TWFE model.
Assuming the control groups are good controls for all cases, the idea would be to use additional interactions based on treatment type.
In other words, if you have two groups, which were treated at the same time.
One that is always treated, and one that left treatment, you can create a dummy with two there categories
never treated
treated and remain treated
treated but left treatment

And run a similar specification that interacts with treatment timing, and type of treatment.

An example:

Code:

ssc install frause
frause mpdta, clear
// creates heterogeneity 
gen ht=runiformint(1,2)
bysort countyreal:gen treat_het = ht[1] * treat
// ID's groups by when treated and treatment heterogeneity
egen ch_het = group(treat_het first_treat)
// This allows for full heterogeneity with treatment timing, cohort and type of treatment
reg lemp i.year i.ch_het i(1 2).treat_het#2004.year#2004.first ///
                         i(1 2).treat_het#2005.year#2004.first ///
                         i(1 2).treat_het#2006.year#2004.first ///
                         i(1 2).treat_het#2007.year#2004.first ///
                         i(1 2).treat_het#2006.year#2006.first ///
                         i(1 2).treat_het#2006.year#2006.first ///
                         i(1 2).treat_het#2007.year#2007.first
// aggregations would need to be done by hand (for now), but is easily extendended

Jeff Wooldridge . Let me know if this capture the thoughts you were having, so I can "officially" add it to -jwdid-.
Fernando

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2078
#10

24 Jul 2023, 14:54

Thanks Fernando. Here's a link to my shared Dropbox. I've updated the example with exit to be a bit easier to implement. (It's in the subfolder called "did_staggered_exit.") It should correspond to your code, but I'd be happy for you to check it!

jwdid_dropbox
1 like
Comment
Sarah Gust

Join Date: Jul 2023

Posts: 2
#11

20 Sep 2023, 04:39

Thank you! This is very helpful!
Comment
Ria Sonpatki

Join Date: Dec 2023

Posts: 2
#12

19 Dec 2023, 06:16

Jeff Wooldridge thanks for sharing the Dropbox link. I'm encountering a similar issue, but I seem to be having trouble grasping the "did_staggered_exit" code. If anyone has experience understanding and interpreting this code, could you lend a hand in helping me comprehend it better?
Thank you.
Comment
Armande Mahabi

Join Date: Jul 2024

Posts: 1
#13

16 Jul 2024, 14:47

Hi,
I am using panel data (2014 to 2021) and the xtdidregress command. I am trying to analyse whether there is a difference in response to Covid-19 shock between savings groups supervised by a female field officer and those supervised by a male field officer.
w is variable that is zero for all observations before 2020, and one for all observations after or equal to 2020 and where the groups are supervised by a female field officer. Gender: 1 if field officer is a woman, 0 else. PostCovid: 1 if year>=2020, 0 else.
I'm using one suggested commands: xtdidregress (satis i.treated i.post) (procedure), nogteffects group(hospital) time(month)
In my case it's: xtdidregress (Savings i.Gender i.postCovid) (w), nogteffects group(ID) time (Year)

I have a problem with missing data. Some savings groups enter and exit data several times. For example, I observe a group in 2014, 2015, 2017, 2019 and 2021. If I've understood correctly from reading notes in "Stata help", missing data before 2020 isn't a problem. The problem arises if there is missing data in 2020 and 2021 (which is my case). Some groups miss data in 2020, others in 2021.

Therefore, after the output table, I receive this note: “Treatment occurs at different times and estimation sample contains units that switch in and out of treatment.” Given that Covid 19 hit all savings groups simultaneously in 2020, do you think I will be wrong if I ignore this note? Or should I use xthdidregress ?

Last edited by Armande Mahabi; 16 Jul 2024, 14:54.
Comment

Dilini Jayasinghe

Join Date: Nov 2024
Posts: 6

#14

10 Dec 2024, 05:17

I am using xtdidregress command in Stata 18.0. Purpose of doing this is to identify the causal effect of an educational program on treated cohort vs control cohort. Treated cohort was born 4 years later than control cohort.

This is the command I used

Code:

xtset hicid age_group
xtdidregress (learning i.treat i.post) (did), nogteffects group(hicid) time(age_group)

I am using age_group as my panel variable because I can only compare the 2 groups across age_group and not the year (due to the 4 year gap between them). The policy was implemented for the treated cohort when they are 6-7 years old. Despite having data for all observations in the first wave (i.e., 4-5 age group), when I run the above command, I receive following output

Code:

. xtdidregress (learning i.treat i.post) (did), nogteffects group(hicid) time(age_group)
note: 1.treat omitted because of collinearity.

Treatment and time information

Time variable: age_group
Control: did = 0
Treatment: did = 1

Control Treatment

Group
hicid 980 906

Time
Minimum 4 6
Maximum 8 8


Difference-in-differences regression Number of obs = 8,036
Data type: Longitudinal

(Std. err. adjusted for 1,886 clusters in hicid)

Robust
learning Coefficient std. err. t P>t [95% conf. interval]

ATET
did
(1 vs 0) .0979579 .0461282 2.12 0.034 .0074902 .1884255

Note: ATET estimate adjusted for covariates and panel effects.
Note: Treatment occurs at different times and estimation sample contains units that switch in and out of treatment.

treat = all observations in the treated cohort (that is younger cohort).
control = all observations in the untreated (i.e., earlier) cohort

post = all observations in both cohorts that is aged 6 and above is marked as 1 and 0 otherwise.

did = post*treat

These are my problems:

1) A note appears

Code:

note: 1.treat omitted because of collinearity.

. When I checked the number of observations for treat and did (i.e., post*treat), there is a variation, and they are not the same. That means there should not be collinearity.

Code:

 . . tabulate treat if cohort == "B"

      treat |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      4,953      100.00      100.00
------------+-----------------------------------
      Total |      4,953      100.00

. tabulate did if cohort == "B"

        did |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        906       18.29       18.29
          1 |      4,047       81.71      100.00
------------+-----------------------------------
      Total |      4,953      100.00

. tabulate treat if cohort == "K"

      treat |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      5,486      100.00      100.00
------------+-----------------------------------
      Total |      5,486      100.00

. tabulate did if cohort == "K"

        did |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |      5,486      100.00      100.00
------------+-----------------------------------
      Total |      5,486      100.00

So I can't figure out why Stata omits treat variable stating collinearity problem.

2) In the first output, under time, the maximum time for both groups is 8, indicating that each group has observations that first appear in different ages. But this is not true when I check the data. All observations for control group first appears only at the age of 4 and no observations start from age 8. Similarly all observations for treated group first appears only at the age of 6.

Is there a way to ask Stata to give me a list of unique IDs where above (2) doesn't hold?

Announcement