Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issue with xthdidregress command on STATA 18

    Hi,

    I am trying to use the xthdidregress with panel data (STATA 18). I am looking at the implementation of a poverty alleviation program in India, using elections as a `treatment' (to determine whether constituencies held by the incumbent see more spending). I have panel data for 10 fiscal years, which I reshaped as long before running xtset and xthdidregress. There are two elections in the dataset, so basically two `treatments' (so two time periods, and two cohorts as a result). I have created a treatment variable for each of the fiscal years, coded as 1 when the incumbent at the state level holds the constituency and 0 otherwise. I have also reshaped these treatment variables, which is why I only type 'treatedac' rather than the full form for each fiscal year ('treatedac12' 'treatedac13' etc.)

    Whenever I run xthdidgress, I get the following error, and I am not sure where this is coming from. I highly suspect this has to do with the way I coded the treatment variable (or the way I am using it), but not sure why... STATA 18 should be able to handle heterogenous treatment under this specification right? When I look at the data editor, I see that STATA recognized the second treatment (the election of 2017), but not the first (the elections of 2012).

    Here's the code:

    . xthdidregress ra (logtotpdac prop_poor) (treatedac), group(ac)
    note: variable _did_cohort, containing cohort indicators formed by treatment variable treatedac and
    group variable ac, was added to the dataset.
    invalid treatment
    The treatment is assumed to be staggered. Once a unit is treated, it should remain treated.
    r(498);

    Thanks for letting me know!

  • #2
    Dear Thibaud,

    -xthdidregress- looks at the treatment group, formed using -treatedac- and -ac, to verify that you have a valid treatment. The pattern of the treatment within -ac- should be a group of zeros followed by ones or a pattern of only zeros. If at some there is a pattern of zeros, followed by ones, and then zeros, you will trigger this error message. In other words, whenever a unit is treated it remains treated and cannot revert to being a control. You can look at your -ac-, -treatedac-, your time variable, and _did_cohort, to see what is happening in your data.

    Comment


    • #3
      Thanks a lot Enrique! So this basically means that I need to code the treatment variable as 1 for an observation 'treated' in 2012 (i.e. in a constituency won by the incumbent) until the end of the entire time period, even if that observation was no longer 'treated' after 2017' (won by the opposition party). Is that correct?

      Comment


      • #4
        I’m traveling, but I can show how to allow for “exit,” but it can’t be done using xthdidregress — at least not yet. 😉

        Comment


        • #5
          Dear Thibaud,

          If the units of your analysis switch from treated to control, the estimators implemented might not be adequate for your analysis. I say MIGHT because you could code units as being treated once they have been treated, even if they switch to become controls later. The implication here is that units that are exposed to treatment are somewhat forever after affected by this initial exposure. But you would need to justify this choice. In other words, it is a bit more than a coding decision, if I understand the problem correctly.

          Clément de Chaisemartin and Xavier D'Haultfoeuille deal with this issue in their 2020 AER paper and have Stata code for it (did_multiplegt). They also have really great accompanying material. I suggest you gather the information at Clément's website: https://sites.google.com/site/clementdechaisemartin/

          Jeff, I am looking forward to hear what you have to say about this issue.
          Last edited by Enrique Pinzon (StataCorp); 14 May 2023, 13:58.

          Comment


          • #6
            Thanks, Enrique. I think the point made in re units being forever treated makes a lot of sense, but even after changing the code (that is, coding as 1 constituencies that were treated in 12 but not in 17), I get the same error message. STATA does not consider the treatment of 2012 for some reason (the did_cohort is either '17' or 'Never treated'), which makes zero sense to me. Why would STATA not want to consider that initial treatment?

            I'll be sure to check the materials recommended above in any case.

            Comment


            • #7
              Hi Thibaud,

              Could you please send your data and code, if possible, to [email protected] and we will take a closer look.

              Comment


              • #8
                Originally posted by Jeff Wooldridge View Post
                I’m traveling, but I can show how to allow for “exit,” but it can’t be done using xthdidregress — at least not yet. 😉
                Are there any news? Jeff Wooldridge I am curious to see your approach to this.

                Comment


                • #9
                  Hi Sarah,
                  I think Prof Wooldridge's approach extends the extended TWFE model.
                  Assuming the control groups are good controls for all cases, the idea would be to use additional interactions based on treatment type.
                  In other words, if you have two groups, which were treated at the same time.
                  One that is always treated, and one that left treatment, you can create a dummy with two there categories
                  never treated
                  treated and remain treated
                  treated but left treatment

                  And run a similar specification that interacts with treatment timing, and type of treatment.

                  An example:

                  Code:
                  ssc install frause
                  frause mpdta, clear
                  // creates heterogeneity 
                  gen ht=runiformint(1,2)
                  bysort countyreal:gen treat_het = ht[1] * treat
                  // ID's groups by when treated and treatment heterogeneity
                  egen ch_het = group(treat_het first_treat)
                  // This allows for full heterogeneity with treatment timing, cohort and type of treatment
                  reg lemp i.year i.ch_het i(1 2).treat_het#2004.year#2004.first ///
                                           i(1 2).treat_het#2005.year#2004.first ///
                                           i(1 2).treat_het#2006.year#2004.first ///
                                           i(1 2).treat_het#2007.year#2004.first ///
                                           i(1 2).treat_het#2006.year#2006.first ///
                                           i(1 2).treat_het#2006.year#2006.first ///
                                           i(1 2).treat_het#2007.year#2007.first
                  // aggregations would need to be done by hand (for now), but is easily extendended
                  Jeff Wooldridge . Let me know if this capture the thoughts you were having, so I can "officially" add it to -jwdid-.
                  Fernando

                  Comment


                  • #10
                    Thanks Fernando. Here's a link to my shared Dropbox. I've updated the example with exit to be a bit easier to implement. (It's in the subfolder called "did_staggered_exit.") It should correspond to your code, but I'd be happy for you to check it!

                    jwdid_dropbox

                    Comment


                    • #11
                      Thank you! This is very helpful!

                      Comment


                      • #12
                        Jeff Wooldridge thanks for sharing the Dropbox link. I'm encountering a similar issue, but I seem to be having trouble grasping the "did_staggered_exit" code. If anyone has experience understanding and interpreting this code, could you lend a hand in helping me comprehend it better?
                        Thank you.

                        Comment


                        • #13
                          Hi,
                          I am using panel data (2014 to 2021) and the xtdidregress command. I am trying to analyse whether there is a difference in response to Covid-19 shock between savings groups supervised by a female field officer and those supervised by a male field officer.
                          w is variable that is zero for all observations before 2020, and one for all observations after or equal to 2020 and where the groups are supervised by a female field officer. Gender: 1 if field officer is a woman, 0 else. PostCovid: 1 if year>=2020, 0 else.
                          I'm using one suggested commands: xtdidregress (satis i.treated i.post) (procedure), nogteffects group(hospital) time(month)
                          In my case it's: xtdidregress (Savings i.Gender i.postCovid) (w), nogteffects group(ID) time (Year)

                          I have a problem with missing data. Some savings groups enter and exit data several times. For example, I observe a group in 2014, 2015, 2017, 2019 and 2021. If I've understood correctly from reading notes in "Stata help", missing data before 2020 isn't a problem. The problem arises if there is missing data in 2020 and 2021 (which is my case). Some groups miss data in 2020, others in 2021.

                          Therefore, after the output table, I receive this note: “Treatment occurs at different times and estimation sample contains units that switch in and out of treatment.” Given that Covid 19 hit all savings groups simultaneously in 2020, do you think I will be wrong if I ignore this note? Or should I use xthdidregress ?
                          Last edited by Armande Mahabi; 16 Jul 2024, 14:54.

                          Comment


                          • #14
                            I am using xtdidregress command in Stata 18.0. Purpose of doing this is to identify the causal effect of an educational program on treated cohort vs control cohort. Treated cohort was born 4 years later than control cohort.

                            This is the command I used
                            Code:
                            xtset hicid age_group
                            xtdidregress (learning i.treat i.post) (did), nogteffects group(hicid) time(age_group)
                            I am using age_group as my panel variable because I can only compare the 2 groups across age_group and not the year (due to the 4 year gap between them). The policy was implemented for the treated cohort when they are 6-7 years old. Despite having data for all observations in the first wave (i.e., 4-5 age group), when I run the above command, I receive following output

                            Code:
                            . xtdidregress (learning i.treat i.post) (did), nogteffects group(hicid) time(age_group)
                            note: 1.treat omitted because of collinearity.
                            
                            Treatment and time information
                            
                            Time variable: age_group
                            Control: did = 0
                            Treatment: did = 1
                            
                            Control Treatment
                            
                            Group
                            hicid 980 906
                            
                            Time
                            Minimum 4 6
                            Maximum 8 8
                            
                            
                            Difference-in-differences regression Number of obs = 8,036
                            Data type: Longitudinal
                            
                            (Std. err. adjusted for 1,886 clusters in hicid)
                            
                            Robust
                            learning Coefficient std. err. t P>t [95% conf. interval]
                            
                            ATET
                            did
                            (1 vs 0) .0979579 .0461282 2.12 0.034 .0074902 .1884255
                            
                            Note: ATET estimate adjusted for covariates and panel effects.
                            Note: Treatment occurs at different times and estimation sample contains units that switch in and out of treatment.
                            treat = all observations in the treated cohort (that is younger cohort).
                            control = all observations in the untreated (i.e., earlier) cohort

                            post = all observations in both cohorts that is aged 6 and above is marked as 1 and 0 otherwise.

                            did = post*treat


                            These are my problems:

                            1) A note appears
                            Code:
                            note: 1.treat omitted because of collinearity.
                            . When I checked the number of observations for treat and did (i.e., post*treat), there is a variation, and they are not the same. That means there should not be collinearity.

                            Code:
                             . . tabulate treat if cohort == "B"
                            
                                  treat |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                                      1 |      4,953      100.00      100.00
                            ------------+-----------------------------------
                                  Total |      4,953      100.00
                            
                            . tabulate did if cohort == "B"
                            
                                    did |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                                      0 |        906       18.29       18.29
                                      1 |      4,047       81.71      100.00
                            ------------+-----------------------------------
                                  Total |      4,953      100.00
                            
                            . tabulate treat if cohort == "K"
                            
                                  treat |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                                      0 |      5,486      100.00      100.00
                            ------------+-----------------------------------
                                  Total |      5,486      100.00
                            
                            . tabulate did if cohort == "K"
                            
                                    did |      Freq.     Percent        Cum.
                            ------------+-----------------------------------
                                      0 |      5,486      100.00      100.00
                            ------------+-----------------------------------
                                  Total |      5,486      100.00
                            So I can't figure out why Stata omits treat variable stating collinearity problem.

                            2) In the first output, under time, the maximum time for both groups is 8, indicating that each group has observations that first appear in different ages. But this is not true when I check the data. All observations for control group first appears only at the age of 4 and no observations start from age 8. Similarly all observations for treated group first appears only at the age of 6.

                            Is there a way to ask Stata to give me a list of unique IDs where above (2) doesn't hold?



                            Comment

                            Working...
                            X