DID with multiple periods and treatments

Marcela Vieira

Join Date: Aug 2021

Posts: 27
#1

DID with multiple periods and treatments

07 Oct 2021, 03:25

Dear all,

I am analysing the impacts of dung beetles (treatment) on livestock productivity (outcome) using Difference-in-Differences. I have panel data from 1960 to 1980, and my geographical units are Local Government Areas (LGAs). My sample size is 94 LGAs. I have five treatments (five dung beetle species) with presence/absence and abundance (treatment intensity). However, each species was introduced at a different year into the LGAs, spreading over time. So I have multiple time periods, e.g. in 1974, species 1 was present in an LGA, then in 1978, species 2 arrived into the same LGA. So while some LGAs might have the five species at some point in time, others will only have one species or none.

The problems/questions are:
When I look at presence/absence of all species together (dummy_general), there are very few control areas (<10 LGAs). So, I am unsure what is the best way to deal when I have a low number of control areas - at some point, most LGAs had at least one beetle species even if in low numbers. In this case, would be better to look at treatment intensity instead? Or even combine dummy with treatment intensity (abundance)? If so, would you be able to help me with the code to include both dummy + treatment intensity?

Currently, my code is very simple:

xtdidregress(livestock_productivity) (dummy_general), group(lga_id) time (year)

xtdidregress(livestock_productivity) (abundance, continuous), group(lga_id) time (year)

2. What is the best way to deal with the parallel trend assumption when there are multiple periods and multiple treatments?

Here is a sample of my data.

Thanks a lot for your help!
Tags: None
Marcela Vieira

Join Date: Aug 2021

Posts: 27
#2

10 Oct 2021, 19:09

Sorry, here is a sample of my data.
Attached Files
Comment
George Ford

Join Date: Aug 2014

Posts: 2669
#3

13 Oct 2021, 08:47

A lot of recent literature on treatment timing. Tricky stuff.

Code:

ssc install ssc install did_multiplegt ssc install csdid

I think csdid is the best implementation I've seen of the treatment timing issue. you can specify not yet treated as controls (or not), but not sure you can do the intensity. did_multiplegt permits it.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2312
#4

13 Oct 2021, 08:55

To add to George help
you also need to install

Code:

ssc install drdid

Unfortunately, it doesn't have an option for intensity. Not yet at least.
1 like
Comment
Marcela Vieira

Join Date: Aug 2021

Posts: 27
#5

17 Nov 2021, 01:49

Hi George and Fernando,

Thank you very much for your help. I finally managed to apply csdid but am struggling to understand the results. It seems like the average treatment effect is positive and significant, but then, there are a few negative and p-value >0.1 when looking at the group results.

my code is: csdid aue_total_ha sum_prec, ivar(lgacodes) time (year1_lga) gvar(first_treat) method(drimp)

estat simple // positive and significant
Average Treatment Effect on Treated
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATT | .365382 .0637398 5.73 0.000 .2404542 .4903098
------------------------------------------------------------------------------

. estat pretrend //but fail parallel test
Pretrend Test. H0 All Pre-treatment are equal to 0
chi2(48) = 668.8259377746836
p-value = 2.7743049849e-110

. estat group
ATT by group
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
G1972 | .5609801 .0446091 12.58 0.000 .4735477 .6484124
G1973 | .4978316 .1721364 2.89 0.004 .1604505 .8352127
G1974 | .3036821 .0676202 4.49 0.000 .171149 .4362152
G1975 | -.0325673 .0557461 -0.58 0.559 -.1418278 .0766931
G1977 | -.2190189 .072184 -3.03 0.002 -.360497 -.0775408
G1979 | -.3638112 .0879044 -4.14 0.000 -.5361007 -.1915218
G1980 | -.2369263 .2650211 -0.89 0.371 -.7563582 .2825056
------------------------------------------------------------------------------

It also failed the parallel trend assumption so I might need to find covariates that might be affecting the trend and check for outliers... any other suggestions when this happens?

Any help is much appreciated,
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2312
#6

17 Nov 2021, 05:38

Regarding the effects. It is what it is. Whatever treatment you are analyzing, it seems to have a positive effect only for early cohorts, Negative on later ones. It may be because later cohorts have been under treatment for shorter periods. I would also look into the dynamic effects (estat event)

regarding the parallel trends, that is a problem. It may be that you cant use DID because of that.
Comment
Marcela Vieira

Join Date: Aug 2021

Posts: 27
#7

22 Nov 2021, 18:11

Thank you Fernando - the estat event is great!

As I understand the estat pretend takes into account never treated and also not-yet-treated units as controls, is that right? I am wondering whether it is possible to test for parallel trend based on "never treated units"? This is a tricky question with divergent opinions I am sure...

I have used the following code to visualise the trend...but am unsure if a formal test is required and the best approach for my case (multiple periods and treatments).

// plot of group means over years
collapse(mean) aue_total_ha, by (year treat)
reshape wide aue_total_ha, i(year) j(treat)
graph twoway connect aue_total_ha* year, sort name(group_means, replace)

where 'treat' receives the value of 1 if that unit was treated at some point in time and 0 if it was never treated.

Many thanks,
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2312
#8

22 Nov 2021, 18:27

Hi Marcella
So, the pretrend basically tests if all the ATTGT's before treatment took place are equal to zero. Whether it uses not yet treated units as controls depends if you add the "notyet" option or not.
Otherwise, all comparisons are against never treated units.
Perhaps my slides here https://friosavila.github.io/playing...did_csdid.html can help.

I think what you are doing may also work. But I would have to see the plots to be sure.
Best wishes
Fernando
Comment
Marcela Vieira

Join Date: Aug 2021

Posts: 27
#9

22 Nov 2021, 20:36

Thanks - the slides are very helpful!

Here is an example of the plot of group means over years - visually, I would say that outcome_a follow a fairly parallel trend but not outcome_b
The first intervention year here is 1972 and I am not including control variables in my plot (because I don't know how to add that to the code). Is there a way to add those controls to the plot?

What are your thoughts on these trends?

Many thanks,

Last edited by Marcela Vieira; 22 Nov 2021, 20:39.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2312
#10

23 Nov 2021, 12:28

I would say that in both cases, parallel trends is violated.
FYI
you could obtain something similar using:
estat event
csdid_plot
Focusing only on the estimates before treatment takes place.
HTH
Comment
Samuel Nocito

Join Date: Apr 2020

Posts: 12
#11

27 Nov 2021, 08:36

Dear all,

I am using the Stata csdid command built by FernandoRios for one of my research projects. I have a doubt about the number of observations reported in the table of results. Specifically, the table of results (let's say when using csdid with the option agg(simple)) shows a number of observations that is smaller than the sample size. When I brows the e(sample) I see that this smaller number of observations relates to never-treated observations until the first period of treatment of the last treated group, and to all pre-treatment periods of the treated groups. Why is it?

Thank you very much in advance for your help!
Samuel
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2312
#12

27 Nov 2021, 18:41

Hi Samuel
First of all, I would ask if you could update both csdid and drdid. In the last version of the commands, I added the information for the number of observations.
In fact, if you type:
matrix list e(gtt)
right after csdid, it will give you the full detail of observations used for each 2x2 DID done by CSDID.
Let me know if that solves your problem
Fernando
Comment

Samuel Nocito

Join Date: Apr 2020
Posts: 12

#13

29 Nov 2021, 01:31

Hi FernandoRios,

thank you very much for your reply! Actually, I've tried to type "matrix list e(gtt)" right after the CSDID command but I'm still confused about the reported number of observations; sorry if I'll sound naive.
Let me use the example suggested in the CSDID's help: the sample is a panel of 500 counties over 5 years (2003-2007); 309 never treated (NT) counties and 191 treated ones (20 counties treated in 2004, 40 counties in 2006 and 131 in 2007). Total number of observations is 2,500.

Code:

use https://friosavila.github.io/playingwithstata/drdid/mpdta.dta, clear

count
2,500

csdid lemp lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw) agg(simple)
............
Difference-in-difference with Multiple Time Periods

                                                Number of obs     =      1,900
Outcome model  : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ATT |  -.0417518   .0115028    -3.63   0.000    -.0642969   -.0192066
------------------------------------------------------------------------------
Control: Never Treated

See Callaway and Sant'Anna (2020) for details

count if e(sample)
1,900

matrix list e(gtt)

e(gtt)[12,7]
     cohort      t0      t1   error       N   N_trt  N_cntr
 r1    2004    2003    2004       0     329     309      20
 r2    2004    2003    2005       0     329     309      20
 r3    2004    2003    2006       0     329     309      20
 r4    2004    2003    2007       0     329     309      20
 r5    2006    2003    2004       0     349     309      40
 r6    2006    2004    2005       0     349     309      40
 r7    2006    2005    2006       0     349     309      40
 r8    2006    2005    2007       0     349     309      40
 r9    2007    2003    2004       0     440     309     131
r10    2007    2004    2005       0     440     309     131
r11    2007    2005    2006       0     440     309     131
r12    2007    2006    2007       0     440     309     131

tab year if first_treat==0 & e(sample)

       year |      Freq.     Percent        Cum.
------------+-----------------------------------
       2003 |        309       25.00       25.00
       2004 |        309       25.00       50.00
       2005 |        309       25.00       75.00
       2006 |        309       25.00      100.00
------------+-----------------------------------
      Total |      1,236      100.00

tab year if first_treat!=0 & e(sample)

       year |      Freq.     Percent        Cum.
------------+-----------------------------------
       2003 |        191       28.77       28.77
       2004 |        171       25.75       54.52
       2005 |        171       25.75       80.27
       2006 |        131       19.73      100.00
------------+-----------------------------------
      Total |        664      100.00

What is not clear to me is the Number of observations of the e(sample) that it is equal to 1,900. These 1,900 observations (see tab's results) are made by all NT counties from 2003 to 2006 (so 2007 is ruled out for NT?) and by some observations for treated counties. Specifically, the number of observations for treated counties decreases in time since in 2003 we have all treated counties observations, in 2004 and in 2005 we lose obs from treated in 2004, and in 2006 we lose obs from treated in 2004 and 2006 (i.e., 60 counties). Why is it so? Sorry again if it sounds naive!

How can I derive the total number of observations (i.e., 1,900) from the matrix?

Another minor question: should the titles of the last two columns of the "matrix list e(gtt)" be inverted? For me, the last column seems to show the N. of treated units instead of control ones.

Thank you very much again for your time and help!
Best,
Samuel

Comment

FernandoRios

Join Date: Apr 2014
Posts: 2312

#14

29 Nov 2021, 07:45

Hi Samuel
I think I know the problem. Did you get the latest version of drdid as well?
In an older version, I had a different way to count observations when using panel data. I have since changed that.
if for some reason you already tried installing the latest from SSC and you are getting the same weird results, please get the one I'm attaching here.
What you should get after matrix list e(gtt) should be this:

Code:

. csdid  lemp lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw)
............
Difference-in-difference with Multiple Time Periods

                                                         Number of obs = 2,500
Outcome model  : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
             | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
g2004        |
 t_2003_2004 |  -.0145297   .0221292    -0.66   0.511     -.057902    .0288427
 t_2003_2005 |  -.0764219   .0286713    -2.67   0.008    -.1326166   -.0202271
 t_2003_2006 |  -.1404483   .0353782    -3.97   0.000    -.2097882   -.0711084
 t_2003_2007 |  -.1069039   .0328865    -3.25   0.001    -.1713602   -.0424476
-------------+----------------------------------------------------------------
g2006        |
 t_2003_2004 |  -.0004721   .0222234    -0.02   0.983    -.0440293     .043085
*****

e(gtt)[12,7]
     cohort      t0      t1   error       N   N_trt  N_cntr
 r1    2004    2003    2004       0     658     618      40
 r2    2004    2003    2005       0     658     618      40
 r3    2004    2003    2006       0     658     618      40
 r4    2004    2003    2007       0     658     618      40
 r5    2006    2003    2004       0     698     618      80
 r6    2006    2004    2005       0     698     618      80
 r7    2006    2005    2006       0     698     618      80
 r8    2006    2005    2007       0     698     618      80
 r9    2007    2003    2004       0     880     618     262
r10    2007    2004    2005       0     880     618     262
r11    2007    2005    2006       0     880     618     262
r12    2007    2006    2007       0     880     618     262

Regarding your last question, you cannot reconstruct Total number of observations with the detailed number of observations, because samples overlap.

Let me know if you can replicate this
Fernando

Attached Files

drdid.ado (83.5 KB, 1 view)

Comment

Samuel Nocito

Join Date: Apr 2020

Posts: 12
#15

29 Nov 2021, 13:32

Dear FernandoRios,

thank you very much for your reply!! I installed again the DRDID package via CSS and I solved the problem, now the total number of observations is correct. Many thanks again!
However, I still have some doubts about the e(gtt) matrix:
Are the titles of the last two columns inverted? For me, the last column seems to show the N. of treated units instead of control ones.

Why has the number of treated observations in the last column doubled after updating the DRDID command?

Finally, I take the chance to ask you one more question: Does the CSDID command include an option for estimating events with a universal base (e.g., t = -1) or is it only possible to estimate varying base events? (see: https://bcallaway11.github.io/posts/...ng-base-period).

Thank you very much again for your time!
Best,
Samuel
Comment

Announcement