Difference-in-Differences AND panel data

Andreas Psarras

Join Date: Nov 2020

Posts: 10
#1

Difference-in-Differences AND panel data

22 Feb 2022, 00:58

Hi Statalist,
I am trying to find the decrease of a count variable during the covid-19 pandemic caused by lockdowns, using the difference-in-differences method. I have a monthly dataset of six years (2015-2020) for 57 areas. Lockdowns start from March of 2020, so I want to compare the output y (a count variable) of march_onwards 2020 with the previous periods of march_onwards (2015…19) having as a control the first two months of each year.
I think that Poisson will be more proper, however, I also want to use didregress and compare the results.
I use the following commands:

xtset areas time_my
xtdidregress ( y i.march_onwards i.year2020) (did), group(month) time(time_my) nogteffects

where:
areas=1…57
month=1…12
march_onwards takes 1 for march until December and 0 otherwise
year2020 takes 1 for 2020 and 0 otherwise
did=maerch_onwards*year2020
month=1…12
time_my is a var with month and year, for ex.2015m1

and I am getting “area not nested within month”

I am stuck and I cannot understand what am I doing wrong.

Last edited by Andreas Psarras; 22 Feb 2022, 01:01.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17708

22 Feb 2022, 03:27

Andreas:
I miss a control group in your research description.
That said, why not considering something along the following toy-example (that use -xtreg,fe-, though):

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. bysort idcode (year): gen control=1 if _n<=2

. replace control=0 if control==.

. xtreg ln_wage c.age##c.age i.control i.year, fe vce(cluster idcode)

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1216                                         min =          1
     Between = 0.1116                                         avg =        6.1
     Overall = 0.0917                                         max =         15

                                                F(17,4709)        =      85.89
corr(u_i, Xb) = 0.0670                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |    .061912   .0136461     4.54   0.000     .0351592    .0886647
             |
 c.age#c.age |  -.0008272   .0001091    -7.58   0.000    -.0010411   -.0006132
             |
   1.control |  -.0772361   .0076503   -10.10   0.000    -.0922342    -.062238
             |
        year |
         69  |   .0682551   .0154793     4.41   0.000     .0379085    .0986018
         70  |  -.0055708    .026743    -0.21   0.835    -.0579996     .046858
         71  |   .0126162   .0387052     0.33   0.744    -.0632641    .0884965
         72  |  -.0019206   .0505348    -0.04   0.970    -.1009924    .0971512
         73  |   -.015082    .062681    -0.24   0.810    -.1379661    .1078021
         75  |  -.0454041   .0861775    -0.53   0.598    -.2143523     .123544
         77  |  -.0286198   .1104378    -0.26   0.796    -.2451296    .1878899
         78  |   -.015028   .1229431    -0.12   0.903    -.2560539    .2259979
         80  |   -.034966   .1468988    -0.24   0.812    -.3229564    .2530245
         82  |  -.0360294   .1709214    -0.21   0.833    -.3711152    .2990565
         83  |  -.0215143   .1829495    -0.12   0.906    -.3801809    .3371523
         85  |   .0184454    .207216     0.09   0.929     -.387795    .4246858
         87  |   .0295929   .2318153     0.13   0.898    -.4248736    .4840594
         88  |   .0866033   .2476505     0.35   0.727    -.3989075    .5721142
             |
       _cons |   .6359801   .2465633     2.58   0.010     .1526008     1.11936
-------------+----------------------------------------------------------------
     sigma_u |  .40386342
     sigma_e |  .30036325
         rho |  .64386252   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Andreas Psarras

Join Date: Nov 2020

Posts: 10
#3

22 Feb 2022, 03:50

Hi Carlo,
thank you for your quick response. I saw your example, however, I think that it is different from what I want to do. Maybe I did not explain it right. The "treatment group" is calendar year 2020 and the "treatment period" includes calendar months from March to December ("march_onwards"). The output y shows seasonability, so in the absence of covid-19( starting with lockdowns from March 2020) we would expect a trend similar to 2015-2019 years. That's why I want to use the same outcome in the previous calendar years (2015-2019) as a "control group" for the year 2020.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17708
#4

22 Feb 2022, 04:00

Andreas:
what if you -xtset- your dataset with -areas- only?

Last edited by Carlo Lazzaro; 22 Feb 2022, 04:18.

Kind regards,
Carlo
(Stata 19.0)
Comment
Andreas Psarras

Join Date: Nov 2020

Posts: 10
#5

22 Feb 2022, 04:30

I tried it and had the same result. I also tried "xtset month", getting back some results, but I am not sure if it is right.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#6

01 Mar 2022, 09:44

I think in #2, Carlo Lazzaro has made the most important point: you do not have a proper control group here. Without that, no analysis is going to work.

What you need for a DID analysis is a set of areas that had lockdowns and another set of areas that did not. You can reasonably restrict your data to March-December in each year--but that is not done by including a march_onwards variable in the model. That is done with an -if- clause or by just dropping the January and February observations before analyzing anything. Then you can do your DID based on the interaction of 2019 vs 2020 and lockdown areas vs non-lockdown areas.

I would also urge great caution on using this approach at all. What, precisely, is your definition of "lockdown?" The term has been used at different times and different places to refer to a highly heterogeneous set of actions taken, from the extremely stringent to the laughably porous, and almost everything imaginable in between. Moreover, different places imposed their "lockdowns" starting at different time and for different durations, and the incidence rates at the time of the lockdowns also vary greatly both within and between locations. Any analysis that does not properly account for all of this heterogeneity is doomed to producing useless and possibly misleading results. In short, I think that estimating the effect of "lockdowns" on any outcome at all is a horrendously complex undertaking and I do not believe it is amenable to simple regression-based approaches, if only because the number of confounding variables that need to be dealt with will rapidly exhaust the degrees of freedom in readily available data, and they probably cannot be dealt with in simple ways even in a massive data set.
1 like
Comment
Andreas Psarras

Join Date: Nov 2020

Posts: 10
#7

06 Mar 2022, 12:03

Prof. Schechter, thank you for your comments.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#8

07 Mar 2022, 05:57

Clyde Schechter is right here Andreas Psarras. Trust me, lockdown studies are pretty much a nightmare, unless you've got a really well defined lockdown like Wuhan or some other really obvious treatment and control group comparison, the amount of things going on is just wildly complex.

It's precisely for this reason I switched to vaccine mandates and other better defined policy areas for COVID policy. But either way, with whatever approach we do, a control group is needed.
2 likes
Comment
Andreas Psarras

Join Date: Nov 2020

Posts: 10
#9

09 Mar 2022, 12:28

In my case all areas had lockdowns at the same time. This DiD is already used and presented in other papers (Metcalfe et.al 2011).

Last edited by Andreas Psarras; 09 Mar 2022, 12:33.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#10

09 Mar 2022, 14:00

Precisely. No control group=no difference-in-differences.

You need a group of units which never received the intervention. Andreas Psarras
Comment
Andreas Psarras

Join Date: Nov 2020

Posts: 10
#11

09 Mar 2022, 23:22

Jared, I use trends in the same variable, in earlier years (2015-19), as a control group.

Last edited by Andreas Psarras; 09 Mar 2022, 23:26.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#12

10 Mar 2022, 05:52

You're not listening to me: my point to you is that this kind of analysis is wrong.

Consider a case of two units, one treated, one untreated. In this situation, we can do what you want because we have a set of units that were never treated, in this case one. Bear in mind, the point of what we're doing is solving a missing data problem, where we attempt to impute the counterfactual.

We do this by comparing a treated unit's pre-intervention outcomes to a unit which did not get treated in the before or after period, hence us calling it a control unit. If you don't have units that are pure controls, if every unit in your sample gets the treatment, how can we know what the counterfactual is, since we observe every unit under treatment after T_0?

Imagine an experiment giving everyone in each group the drug at the same time. We couldn't know if the drug worked because everyone was treated, there's nobody or no-thing to compare it to.
Comment
Andreas Psarras

Join Date: Nov 2020

Posts: 10
#13

11 Mar 2022, 02:46

Jared, I am listening to what you re saying. This is not sothething that I found. There are articles based on this kind of control groups (https://doi.org/10.1016/j.socscimed.2020.113101). As I already mentioned the dependent variable shows seasonality, presenting the same trend in previous years (2015-19).
Comment

Announcement

Difference-in-Differences AND panel data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment