Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DID with multiple periods and treatments

    Dear all,

    I am analysing the impacts of dung beetles (treatment) on livestock productivity (outcome) using Difference-in-Differences. I have panel data from 1960 to 1980, and my geographical units are Local Government Areas (LGAs). My sample size is 94 LGAs. I have five treatments (five dung beetle species) with presence/absence and abundance (treatment intensity). However, each species was introduced at a different year into the LGAs, spreading over time. So I have multiple time periods, e.g. in 1974, species 1 was present in an LGA, then in 1978, species 2 arrived into the same LGA. So while some LGAs might have the five species at some point in time, others will only have one species or none.

    The problems/questions are:
    1. When I look at presence/absence of all species together (dummy_general), there are very few control areas (<10 LGAs). So, I am unsure what is the best way to deal when I have a low number of control areas - at some point, most LGAs had at least one beetle species even if in low numbers. In this case, would be better to look at treatment intensity instead? Or even combine dummy with treatment intensity (abundance)? If so, would you be able to help me with the code to include both dummy + treatment intensity?
    Currently, my code is very simple:

    xtdidregress(livestock_productivity) (dummy_general), group(lga_id) time (year)

    xtdidregress(livestock_productivity) (abundance, continuous), group(lga_id) time (year)


    2. What is the best way to deal with the parallel trend assumption when there are multiple periods and multiple treatments?

    Here is a sample of my data.

    Thanks a lot for your help!

  • #2
    Sorry, here is a sample of my data.
    Attached Files

    Comment


    • #3
      A lot of recent literature on treatment timing. Tricky stuff.

      Code:
      ssc install ssc install did_multiplegt
      ssc install csdid
      I think csdid is the best implementation I've seen of the treatment timing issue. you can specify not yet treated as controls (or not), but not sure you can do the intensity. did_multiplegt permits it.

      Comment


      • #4
        To add to George help
        you also need to install
        Code:
        ssc install drdid
        Unfortunately, it doesn't have an option for intensity. Not yet at least.

        Comment


        • #5
          Hi George and Fernando,

          Thank you very much for your help. I finally managed to apply csdid but am struggling to understand the results. It seems like the average treatment effect is positive and significant, but then, there are a few negative and p-value >0.1 when looking at the group results.

          my code is: csdid aue_total_ha sum_prec, ivar(lgacodes) time (year1_lga) gvar(first_treat) method(drimp)

          estat simple // positive and significant
          Average Treatment Effect on Treated
          ------------------------------------------------------------------------------
          | Coefficient Std. err. z P>|z| [95% conf. interval]
          -------------+----------------------------------------------------------------
          ATT | .365382 .0637398 5.73 0.000 .2404542 .4903098
          ------------------------------------------------------------------------------

          . estat pretrend //but fail parallel test
          Pretrend Test. H0 All Pre-treatment are equal to 0
          chi2(48) = 668.8259377746836
          p-value = 2.7743049849e-110

          . estat group
          ATT by group
          ------------------------------------------------------------------------------
          | Coefficient Std. err. z P>|z| [95% conf. interval]
          -------------+----------------------------------------------------------------
          G1972 | .5609801 .0446091 12.58 0.000 .4735477 .6484124
          G1973 | .4978316 .1721364 2.89 0.004 .1604505 .8352127
          G1974 | .3036821 .0676202 4.49 0.000 .171149 .4362152
          G1975 | -.0325673 .0557461 -0.58 0.559 -.1418278 .0766931
          G1977 | -.2190189 .072184 -3.03 0.002 -.360497 -.0775408
          G1979 | -.3638112 .0879044 -4.14 0.000 -.5361007 -.1915218
          G1980 | -.2369263 .2650211 -0.89 0.371 -.7563582 .2825056
          ------------------------------------------------------------------------------

          It also failed the parallel trend assumption so I might need to find covariates that might be affecting the trend and check for outliers... any other suggestions when this happens?


          Any help is much appreciated,

          Comment


          • #6
            Regarding the effects. It is what it is. Whatever treatment you are analyzing, it seems to have a positive effect only for early cohorts, Negative on later ones. It may be because later cohorts have been under treatment for shorter periods. I would also look into the dynamic effects (estat event)

            regarding the parallel trends, that is a problem. It may be that you cant use DID because of that.

            Comment


            • #7
              Thank you Fernando - the estat event is great!

              As I understand the estat pretend takes into account never treated and also not-yet-treated units as controls, is that right? I am wondering whether it is possible to test for parallel trend based on "never treated units"? This is a tricky question with divergent opinions I am sure...

              I have used the following code to visualise the trend...but am unsure if a formal test is required and the best approach for my case (multiple periods and treatments).

              // plot of group means over years
              collapse(mean) aue_total_ha, by (year treat)
              reshape wide aue_total_ha, i(year) j(treat)
              graph twoway connect aue_total_ha* year, sort name(group_means, replace)

              where 'treat' receives the value of 1 if that unit was treated at some point in time and 0 if it was never treated.

              Many thanks,

              Comment


              • #8
                Hi Marcella
                So, the pretrend basically tests if all the ATTGT's before treatment took place are equal to zero. Whether it uses not yet treated units as controls depends if you add the "notyet" option or not.
                Otherwise, all comparisons are against never treated units.
                Perhaps my slides here https://friosavila.github.io/playing...did_csdid.html can help.

                I think what you are doing may also work. But I would have to see the plots to be sure.
                Best wishes
                Fernando

                Comment


                • #9
                  Thanks - the slides are very helpful!

                  Here is an example of the plot of group means over years - visually, I would say that outcome_a follow a fairly parallel trend but not outcome_b
                  The first intervention year here is 1972 and I am not including control variables in my plot (because I don't know how to add that to the code). Is there a way to add those controls to the plot?

                  Click image for larger version

Name:	group_means - outcome_a.jpg
Views:	1
Size:	43.8 KB
ID:	1637701 Click image for larger version

Name:	group_means - outcome_b.jpg
Views:	1
Size:	46.2 KB
ID:	1637702
                  What are your thoughts on these trends?

                  Many thanks,
                  Last edited by Marcela Vieira; 22 Nov 2021, 20:39.

                  Comment


                  • #10
                    I would say that in both cases, parallel trends is violated.
                    FYI
                    you could obtain something similar using:
                    estat event
                    csdid_plot
                    Focusing only on the estimates before treatment takes place.
                    HTH

                    Comment


                    • #11
                      Dear all,

                      I am using the Stata csdid command built by FernandoRios for one of my research projects. I have a doubt about the number of observations reported in the table of results. Specifically, the table of results (let's say when using csdid with the option agg(simple)) shows a number of observations that is smaller than the sample size. When I brows the e(sample) I see that this smaller number of observations relates to never-treated observations until the first period of treatment of the last treated group, and to all pre-treatment periods of the treated groups. Why is it?

                      Thank you very much in advance for your help!
                      Samuel

                      Comment


                      • #12
                        Hi Samuel
                        First of all, I would ask if you could update both csdid and drdid. In the last version of the commands, I added the information for the number of observations.
                        In fact, if you type:
                        matrix list e(gtt)
                        right after csdid, it will give you the full detail of observations used for each 2x2 DID done by CSDID.
                        Let me know if that solves your problem
                        Fernando

                        Comment


                        • #13
                          Hi FernandoRios,

                          thank you very much for your reply! Actually, I've tried to type "matrix list e(gtt)" right after the CSDID command but I'm still confused about the reported number of observations; sorry if I'll sound naive.
                          Let me use the example suggested in the CSDID's help: the sample is a panel of 500 counties over 5 years (2003-2007); 309 never treated (NT) counties and 191 treated ones (20 counties treated in 2004, 40 counties in 2006 and 131 in 2007). Total number of observations is 2,500.

                          Code:
                          use https://friosavila.github.io/playingwithstata/drdid/mpdta.dta, clear
                          
                          count
                          2,500
                          
                          csdid lemp lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw) agg(simple)
                          ............
                          Difference-in-difference with Multiple Time Periods
                          
                                                                          Number of obs     =      1,900
                          Outcome model  : least squares
                          Treatment model: inverse probability
                          ------------------------------------------------------------------------------
                                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                                   ATT |  -.0417518   .0115028    -3.63   0.000    -.0642969   -.0192066
                          ------------------------------------------------------------------------------
                          Control: Never Treated
                          
                          See Callaway and Sant'Anna (2020) for details
                          
                          count if e(sample)
                          1,900
                          
                          matrix list e(gtt)
                          
                          e(gtt)[12,7]
                               cohort      t0      t1   error       N   N_trt  N_cntr
                           r1    2004    2003    2004       0     329     309      20
                           r2    2004    2003    2005       0     329     309      20
                           r3    2004    2003    2006       0     329     309      20
                           r4    2004    2003    2007       0     329     309      20
                           r5    2006    2003    2004       0     349     309      40
                           r6    2006    2004    2005       0     349     309      40
                           r7    2006    2005    2006       0     349     309      40
                           r8    2006    2005    2007       0     349     309      40
                           r9    2007    2003    2004       0     440     309     131
                          r10    2007    2004    2005       0     440     309     131
                          r11    2007    2005    2006       0     440     309     131
                          r12    2007    2006    2007       0     440     309     131
                          
                          tab year if first_treat==0 & e(sample)
                          
                                 year |      Freq.     Percent        Cum.
                          ------------+-----------------------------------
                                 2003 |        309       25.00       25.00
                                 2004 |        309       25.00       50.00
                                 2005 |        309       25.00       75.00
                                 2006 |        309       25.00      100.00
                          ------------+-----------------------------------
                                Total |      1,236      100.00
                          
                          tab year if first_treat!=0 & e(sample)
                          
                                 year |      Freq.     Percent        Cum.
                          ------------+-----------------------------------
                                 2003 |        191       28.77       28.77
                                 2004 |        171       25.75       54.52
                                 2005 |        171       25.75       80.27
                                 2006 |        131       19.73      100.00
                          ------------+-----------------------------------
                                Total |        664      100.00
                          What is not clear to me is the Number of observations of the e(sample) that it is equal to 1,900. These 1,900 observations (see tab's results) are made by all NT counties from 2003 to 2006 (so 2007 is ruled out for NT?) and by some observations for treated counties. Specifically, the number of observations for treated counties decreases in time since in 2003 we have all treated counties observations, in 2004 and in 2005 we lose obs from treated in 2004, and in 2006 we lose obs from treated in 2004 and 2006 (i.e., 60 counties). Why is it so? Sorry again if it sounds naive!

                          How can I derive the total number of observations (i.e., 1,900) from the matrix?

                          Another minor question: should the titles of the last two columns of the "matrix list e(gtt)" be inverted? For me, the last column seems to show the N. of treated units instead of control ones.

                          Thank you very much again for your time and help!
                          Best,
                          Samuel

                          Comment


                          • #14
                            Hi Samuel
                            I think I know the problem. Did you get the latest version of drdid as well?
                            In an older version, I had a different way to count observations when using panel data. I have since changed that.
                            if for some reason you already tried installing the latest from SSC and you are getting the same weird results, please get the one I'm attaching here.
                            What you should get after matrix list e(gtt) should be this:
                            Code:
                            . csdid  lemp lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw)
                            ............
                            Difference-in-difference with Multiple Time Periods
                            
                                                                                     Number of obs = 2,500
                            Outcome model  : least squares
                            Treatment model: inverse probability
                            ------------------------------------------------------------------------------
                                         | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
                            -------------+----------------------------------------------------------------
                            g2004        |
                             t_2003_2004 |  -.0145297   .0221292    -0.66   0.511     -.057902    .0288427
                             t_2003_2005 |  -.0764219   .0286713    -2.67   0.008    -.1326166   -.0202271
                             t_2003_2006 |  -.1404483   .0353782    -3.97   0.000    -.2097882   -.0711084
                             t_2003_2007 |  -.1069039   .0328865    -3.25   0.001    -.1713602   -.0424476
                            -------------+----------------------------------------------------------------
                            g2006        |
                             t_2003_2004 |  -.0004721   .0222234    -0.02   0.983    -.0440293     .043085
                            *****
                            
                            e(gtt)[12,7]
                                 cohort      t0      t1   error       N   N_trt  N_cntr
                             r1    2004    2003    2004       0     658     618      40
                             r2    2004    2003    2005       0     658     618      40
                             r3    2004    2003    2006       0     658     618      40
                             r4    2004    2003    2007       0     658     618      40
                             r5    2006    2003    2004       0     698     618      80
                             r6    2006    2004    2005       0     698     618      80
                             r7    2006    2005    2006       0     698     618      80
                             r8    2006    2005    2007       0     698     618      80
                             r9    2007    2003    2004       0     880     618     262
                            r10    2007    2004    2005       0     880     618     262
                            r11    2007    2005    2006       0     880     618     262
                            r12    2007    2006    2007       0     880     618     262
                            Regarding your last question, you cannot reconstruct Total number of observations with the detailed number of observations, because samples overlap.

                            Let me know if you can replicate this
                            Fernando
                            Attached Files

                            Comment


                            • #15
                              Dear FernandoRios,

                              thank you very much for your reply!! I installed again the DRDID package via CSS and I solved the problem, now the total number of observations is correct. Many thanks again!
                              However, I still have some doubts about the e(gtt) matrix:
                              1. Are the titles of the last two columns inverted? For me, the last column seems to show the N. of treated units instead of control ones.
                              2. Why has the number of treated observations in the last column doubled after updating the DRDID command?
                              Finally, I take the chance to ask you one more question: Does the CSDID command include an option for estimating events with a universal base (e.g., t = -1) or is it only possible to estimate varying base events? (see: https://bcallaway11.github.io/posts/...ng-base-period).

                              Thank you very much again for your time!
                              Best,
                              Samuel

                              Comment

                              Working...
                              X