Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generalized/staggered difference-in-difference with moderator

    Hello

    I am using a generalized/staggered difference-in-difference approach to investigate the effect of transferring from a public employer to a private employer.

    Here is my code:
    Code:
    xtset id time
    xtreg outcome i.treatment i.active_treatment i.time [aw = cem_weights] , fe cluster(id)
    treatment is 1 in the group that receives the treatment (and is 1 in those observations at all times, including before treatment started) and 0 in all observations for the untreated group.
    active_treatment is 1 in the treatment group after treatment begins, but is 0 in the treatment group before treatment begins and is 0 in all observations in the control group
    I have read that the coefficient for active_treatment is my generalized difference-in-difference coefficient.

    If possible, I would also like to moderate the effects of the treatment by job type:
    Code:
    xtreg outcome i.treatment i.active_treatment##i.job_type i.time i.partnership [aw = cem_weights] , fe cluster(id)
    I have attempted to use margins and marginsplot to visualize the effect:
    Code:
    margins jobe_type, at(active_treatment= (0 1))
    marginsplot
    interaction.JPG
    However, I would like some help with interpreting the plot as I am not entirely sure what is left in the 0-category? It might also be the case that the interaction needs to be calculated in a completely different way.

    Appreciate any help!

    Gustav

  • #2
    There is no 0-category (or reference category) in the -margins- output. Your variable job_type takes on exactly 6 values, 1 through 6, in your estimation sample. (If you think the variable should have more values than that, then they have been omitted from estimation due to missing values on other model variables, or the observations reflecting those other values have somehow been dropped from your data set before estimation. Either way, that's a problem in its own right that you should investigate.)

    Your -marginsplot- output has 6 lines, each corresponding to a level of job_type. The points at the left end of the lines correspond to the expected outcome when active_treatment = 0, and those at the right are the corresponding expected outcomes when active_treatment = 1. That's all there is to it.

    Comment


    • #3
      Hi Clyde

      Many thanks for the quick reply. Job_type is a categorical variable that reflects broad job categories (initially 1200 categories). There are five seperate categories (like care work or office work) left after matching and one category, which I have for the time begin called "other".
      My question is more precisely how to interpret active_treatment = 0? My take is that it is the post-treatment outcome for the controlgroup. Is that correct?
      Last edited by Gustav Egede Hansen; 21 Jun 2021, 00:25.

      Comment


      • #4
        Or would it be more correct to say that it is the counterfactual outcome for the treatmentgroup (i.e. the outcome if they had not receive the treatment)?

        Comment


        • #5
          My question is more precisely how to interpret active_treatment = 0? My take is that it is the post-treatment outcome for the controlgroup. Is that correct?
          Or would it be more correct to say that it is the counterfactual outcome for the treatmentgroup (i.e. the outcome if they had not receive the treatment)?
          In a sense it is both, but strictly speaking it is neither. The observations where the variable active_treatment = 0 are a heterogeneous collection of entities that are never treated and other entities that will be treated eventually but haven't been yet. To the extent that you can imagine that these are equivalent, then you can consider both of your interpretations to be correct. But that equivalence is hardly guaranteed and often false in the real world, so really neither interpretation applies and there is, in fact, no simple description of this group. They exist mathematically simply as a comparator group for those observations which are under treatment at the time they are observed.

          Despite this complexity, because the model includes fixed effects for both entities and time periods, in the context of these other variables, the coefficient of this exotic variable turns out to be a DID estimator of the treatment effect.

          Comment


          • #6
            Again, thank you so much! I cannot tell you how much I appreciate your input.

            I have one more question to ask if you do not mind. It is about whether staggered diff-in-diff is actually the right approach to my study.

            My treatment group consists of three groups of public employees who are transferred to different private companies in different years. The three cases of outsourcing have nothing to do with each other, and a treated individual from one case of outsourcing does not reappear in another case of outsourcing. It is, in other words, not a gradual rollout of the intervention. Furthermore, I have ensured that a treated individual in, e.g., outsourcing 1 could not become a control in, e.g., outsourcing 2.

            I prefer to use the classical diff-in-diff as it allows me to see the effect in each time period, e.g.:
            Code:
            reg salary i.treatment##i.time [aw = cem_weights], cluster(id)
            margins treatment, at(time=(97(1)103))
            So my question is, can I use the classical diff-in-diff with standardized time, or should I go with the staggered diff-in-diff approach?

            Gustav
            Last edited by Gustav Egede Hansen; 22 Jun 2021, 08:07.

            Comment


            • #7
              Unless I am misunderstanding, you cannot do what you propose in #6. The difficulty is that the same value of time will be a pre-outsource year for some people and a post-outsource year for others. The most critical part of the DID model is that you have an interaction term that identifies the treated entities in their post-treatment status. The code in #6 does not do that.

              Comment


              • #8
                Thank you again!

                That does not sound good. However, let me show you an example of my data. Here you have two treated individuals:
                Code:
                clear
                input float(id year time outsourcing_year treatment outsourcing_case company_id sector salary)
                 1 2010  97 0 1 1  1 0 20000
                 1 2011  98 0 1 1  1 0 20000
                 1 2012  99 0 1 1  1 0 20000
                 1 2013 100 1 1 1  2 1 18000
                 1 2014 101 0 1 1  2 1 18000
                 1 2015 102 0 1 1  2 1 18000
                 1 2016 103 0 1 1  2 1 18000
                12 2013  97 0 1 2 11 0 24000
                12 2014  98 0 1 2 11 0 24000
                12 2015  99 0 1 2 11 0 24000
                12 2016 100 1 1 2 22 1 20000
                12 2017 101 0 1 2 22 1 20000
                12 2018 102 0 1 2 22 1 20000
                12 2019 103 0 1 2 22 1 20000
                end
                label values sector sektor
                label def sektor 0 "public", modify
                label def sektor 1 "private", modify
                As you can see, ID = 1 experiences outsourcing in the year 2013, and ID = 12 experiences outsourcing in the year 2016. This is because they are part of two different and independent cases of outsourcing. The time-variable is a “standardization” of years. When time is 100, outsourcing begins irrespective of whether the year is 2013 or 2016. In my full dataset, where I have my control group, I have no problem running the code. I am just worried that the standardization of time might be the wrong approach to deal with different treatment timing.

                Comment


                • #9
                  Yes, that makes sense for your regression.

                  But, -margins treatment, at(time=(97(1)103))- makes no sense in this context. Seeing this command in #6, I assumed that your time variable was still the calendar year (minus 2000); so, as I suspected in #7, I did not fully understand what you were saying.

                  Comment


                  • #10
                    Ah, I see. Sorry for not being clear enough. I wanted to use 97…100…103 as I did not expect STATA to handle -3…0…3.
                    So if I understand your inputs correctly, I can use the classical diff-in-diff for my analysis, i.e:
                    Code:
                    reg salary i.treatment##i.time [aw = cem_weights], cluster(id)
                    instead of the staggered diff-in-diff:
                    Code:
                    xtreg outcome i.treatment i.active_treatment i.time [aw = cem_weights] , fe cluster(id)
                    Is that correct?
                    Last edited by Gustav Egede Hansen; 22 Jun 2021, 13:35.

                    Comment


                    • #11
                      Sorry, yes, you are on the right track here. And the -margins, at(time = (97(1)103))- command is fine. I got a little mixed up because when I saw that I was thinking that 97 corresponded to 1997, and 103 to 2003, and so on. But, no, you have it right. It's just a little odd to start at 97--but that's what you get when you standardize the outsourcing year to 100.. But perfectly OK to do. So, proceed.

                      One other thing, you should still be using -xtreg, fe-, not -reg-, because you have repeated observations on the same individuals.

                      Sorry for my confusion on this.

                      Comment


                      • #12
                        No need to say sorry! You have helped me tremendously and I am very grateful.

                        And yes I will use xtreg and fe.

                        Comment


                        • #13
                          Hi Clyde,

                          I have a follow up question if you do not mind. When estimating my difference-in-difference with fixed effects, I use the nostimcheck to calculate the predictive margins and marginal effects. However, the difference between the control and treatment group in the first time period (98) is zero in all my models. Again, I cannot share my original dataset, but I have made an example dataset to illustrate the issue:

                          Code:
                          clear
                          input float(id time treatment salary)
                          14 100 0 20000
                           7 100 0 20000
                          13 102 0 23000
                          13  98 0 23000
                           9 102 0 21000
                          10 101 0 20000
                          10 102 0 20000
                          10 103 0 20000
                          14  99 0 22000
                          10  98 0 21000
                          14 101 0 20000
                           7  98 0 20000
                          13 100 0 23000
                           6  99 0 24000
                           9  98 0 21000
                           8 103 0 20000
                          10  99 0 21000
                           8 101 0 20000
                           8 100 0 20000
                           8  99 0 20000
                           7 102 0 20000
                           7 101 0 20000
                           6 102 0 24000
                          13  99 0 23000
                           6 101 0 24000
                           7  99 0 20000
                          13 103 0 22000
                           9  99 0 21000
                           9 100 0 21000
                          14 103 0 20000
                           8 102 0 20000
                           9 103 0 21000
                          14  98 0 20000
                           8  98 0 20000
                          13 101 0 23000
                           6 100 0 24000
                          14 102 0 20000
                           6  98 0 24000
                          10 100 0 20000
                           7 103 0 20000
                           9 101 0 21000
                           6 103 0 24000
                           3 102 1 24000
                           1 103 1 15000
                          12  98 1 20000
                           4 100 1 15000
                           5  99 1 20000
                           2 100 1 17000
                          12  99 1 20000
                          12 101 1 20000
                           3 103 1 24000
                           3  99 1 21000
                           1 101 1 15000
                           2  99 1 20000
                           4 102 1 15000
                           1 102 1 15000
                           3 100 1 24000
                           2 103 1 17000
                           2 101 1 17000
                           2  98 1 20000
                           1  99 1 20000
                           1  98 1 20000
                           5 102 1 22000
                           1 100 1 15000
                           5  98 1 20000
                           5 101 1 22000
                          12 100 1 20000
                           4 101 1 15000
                           3  98 1 21000
                           3 101 1 24000
                           4  99 1 19000
                           2 102 1 17000
                           4 103 1 15000
                           4  98 1 19000
                          12 102 1 19000
                           5 100 1 21000
                          12 103 1 19000
                           5 103 1 22000
                          end

                          Here is some code for illustration:
                          Code:
                          xtset id time
                          xtreg salary i.treatment##i.time, fe cluster(id)
                          margins treatment, at(time =(98 99 100 101 102 103)) noestimcheck
                                              *_at#treatment
                                              *1 0  |     20692.31
                                              *1 1  |     20692.31
                          marginsplot
                          Same result in time = 98 when estimating the marginal effects
                          Code:
                          margins, dydx(treatment) at(time =(98 99 100 101 102 103)) noestimcheck
                                              *1.treatment
                                              *_at
                                              * 1  |     0
                          marginsplot
                          Can you (or someone else) explain what is happening in the first time period? I would like to assess the parallel trend, so this is problematic.


                          Gustav

                          Comment


                          • #14
                            I see. The problem you are encountering was actually flagged for you by Stata--you used the -noestimcheck- option here. But you did that because without it you got "(not estimable)" results, which was Stata's warning that you might be doing something that is ill-defined. And in this setting it is, indeed, ill-defined. The problem is that the variable treatment is colinear with the fixed effects. To break the colinearity, the treatment variable is omitted from the model. But that is an arbitrary decision. The colinearity could have been broken by omitting one of the fixed effects (try it with -regress salary i.treatment##i.time i.id- and you will see that treatment is retained in the model, but an extra id is dropped.) This means that the effects of treatment and the fixed effects cannot be separately identified. When you use -noestimcheck- you persuade Stata to do some calculations, but the results are a function of the particular way the colinearity is broken, and they are not meaningful effects.

                            Comment


                            • #15
                              Hi Clyde Schechter. I have been thinking alot about your answer and solution recently. However, in this specific case, I cannot break the colinarity using -regress salary i.treatment##i.time i.id-, as I have too many observation at the individual level. Is there another way to break the collinarity? My issue is that I would like to see the development over time with fixed effects to assess the parallel trends.

                              Comment

                              Working...
                              X