Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-Differences AND panel data

    Hi Statalist,
    I am trying to find the decrease of a count variable during the covid-19 pandemic caused by lockdowns, using the difference-in-differences method. I have a monthly dataset of six years (2015-2020) for 57 areas. Lockdowns start from March of 2020, so I want to compare the output y (a count variable) of march_onwards 2020 with the previous periods of march_onwards (2015…19) having as a control the first two months of each year.
    I think that Poisson will be more proper, however, I also want to use didregress and compare the results.
    I use the following commands:

    xtset areas time_my
    xtdidregress ( y i.march_onwards i.year2020) (did), group(month) time(time_my) nogteffects

    where:
    areas=1…57
    month=1…12
    march_onwards takes 1 for march until December and 0 otherwise
    year2020 takes 1 for 2020 and 0 otherwise
    did=maerch_onwards*year2020
    month=1…12
    time_my is a var with month and year, for ex.2015m1

    and I am getting “area not nested within month”

    I am stuck and I cannot understand what am I doing wrong.
    Last edited by Andreas Psarras; 22 Feb 2022, 02:01.

  • #2
    Andreas:
    I miss a control group in your research description.
    That said, why not considering something along the following toy-example (that use -xtreg,fe-, though):
    Code:
    . use "https://www.stata-press.com/data/r17/nlswork.dta"
    (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
    
    . bysort idcode (year): gen control=1 if _n<=2
    
    . replace control=0 if control==.
    
    . xtreg ln_wage c.age##c.age i.control i.year, fe vce(cluster idcode)
    
    Fixed-effects (within) regression               Number of obs     =     28,510
    Group variable: idcode                          Number of groups  =      4,710
    
    R-squared:                                      Obs per group:
         Within  = 0.1216                                         min =          1
         Between = 0.1116                                         avg =        6.1
         Overall = 0.0917                                         max =         15
    
                                                    F(17,4709)        =      85.89
    corr(u_i, Xb) = 0.0670                          Prob > F          =     0.0000
    
                                 (Std. err. adjusted for 4,710 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             age |    .061912   .0136461     4.54   0.000     .0351592    .0886647
                 |
     c.age#c.age |  -.0008272   .0001091    -7.58   0.000    -.0010411   -.0006132
                 |
       1.control |  -.0772361   .0076503   -10.10   0.000    -.0922342    -.062238
                 |
            year |
             69  |   .0682551   .0154793     4.41   0.000     .0379085    .0986018
             70  |  -.0055708    .026743    -0.21   0.835    -.0579996     .046858
             71  |   .0126162   .0387052     0.33   0.744    -.0632641    .0884965
             72  |  -.0019206   .0505348    -0.04   0.970    -.1009924    .0971512
             73  |   -.015082    .062681    -0.24   0.810    -.1379661    .1078021
             75  |  -.0454041   .0861775    -0.53   0.598    -.2143523     .123544
             77  |  -.0286198   .1104378    -0.26   0.796    -.2451296    .1878899
             78  |   -.015028   .1229431    -0.12   0.903    -.2560539    .2259979
             80  |   -.034966   .1468988    -0.24   0.812    -.3229564    .2530245
             82  |  -.0360294   .1709214    -0.21   0.833    -.3711152    .2990565
             83  |  -.0215143   .1829495    -0.12   0.906    -.3801809    .3371523
             85  |   .0184454    .207216     0.09   0.929     -.387795    .4246858
             87  |   .0295929   .2318153     0.13   0.898    -.4248736    .4840594
             88  |   .0866033   .2476505     0.35   0.727    -.3989075    .5721142
                 |
           _cons |   .6359801   .2465633     2.58   0.010     .1526008     1.11936
    -------------+----------------------------------------------------------------
         sigma_u |  .40386342
         sigma_e |  .30036325
             rho |  .64386252   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    .
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hi Carlo,
      thank you for your quick response. I saw your example, however, I think that it is different from what I want to do. Maybe I did not explain it right. The "treatment group" is calendar year 2020 and the "treatment period" includes calendar months from March to December ("march_onwards"). The output y shows seasonability, so in the absence of covid-19( starting with lockdowns from March 2020) we would expect a trend similar to 2015-2019 years. That's why I want to use the same outcome in the previous calendar years (2015-2019) as a "control group" for the year 2020.

      Comment


      • #4
        Andreas:
        what if you -xtset- your dataset with -areas- only?
        Last edited by Carlo Lazzaro; 22 Feb 2022, 05:18.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          I tried it and had the same result. I also tried "xtset month", getting back some results, but I am not sure if it is right.

          Comment


          • #6
            I think in #2, Carlo Lazzaro has made the most important point: you do not have a proper control group here. Without that, no analysis is going to work.

            What you need for a DID analysis is a set of areas that had lockdowns and another set of areas that did not. You can reasonably restrict your data to March-December in each year--but that is not done by including a march_onwards variable in the model. That is done with an -if- clause or by just dropping the January and February observations before analyzing anything. Then you can do your DID based on the interaction of 2019 vs 2020 and lockdown areas vs non-lockdown areas.

            I would also urge great caution on using this approach at all. What, precisely, is your definition of "lockdown?" The term has been used at different times and different places to refer to a highly heterogeneous set of actions taken, from the extremely stringent to the laughably porous, and almost everything imaginable in between. Moreover, different places imposed their "lockdowns" starting at different time and for different durations, and the incidence rates at the time of the lockdowns also vary greatly both within and between locations. Any analysis that does not properly account for all of this heterogeneity is doomed to producing useless and possibly misleading results. In short, I think that estimating the effect of "lockdowns" on any outcome at all is a horrendously complex undertaking and I do not believe it is amenable to simple regression-based approaches, if only because the number of confounding variables that need to be dealt with will rapidly exhaust the degrees of freedom in readily available data, and they probably cannot be dealt with in simple ways even in a massive data set.

            Comment


            • #7
              Prof. Schechter, thank you for your comments.

              Comment


              • #8
                Clyde Schechter is right here Andreas Psarras. Trust me, lockdown studies are pretty much a nightmare, unless you've got a really well defined lockdown like Wuhan or some other really obvious treatment and control group comparison, the amount of things going on is just wildly complex.

                It's precisely for this reason I switched to vaccine mandates and other better defined policy areas for COVID policy. But either way, with whatever approach we do, a control group is needed.

                Comment


                • #9
                  In my case all areas had lockdowns at the same time. This DiD is already used and presented in other papers (Metcalfe et.al 2011).
                  Last edited by Andreas Psarras; 09 Mar 2022, 13:33.

                  Comment


                  • #10
                    Precisely. No control group=no difference-in-differences.


                    You need a group of units which never received the intervention. Andreas Psarras

                    Comment


                    • #11
                      Jared, I use trends in the same variable, in earlier years (2015-19), as a control group.
                      Last edited by Andreas Psarras; 10 Mar 2022, 00:26.

                      Comment


                      • #12
                        You're not listening to me: my point to you is that this kind of analysis is wrong.

                        Consider a case of two units, one treated, one untreated. In this situation, we can do what you want because we have a set of units that were never treated, in this case one. Bear in mind, the point of what we're doing is solving a missing data problem, where we attempt to impute the counterfactual.

                        We do this by comparing a treated unit's pre-intervention outcomes to a unit which did not get treated in the before or after period, hence us calling it a control unit. If you don't have units that are pure controls, if every unit in your sample gets the treatment, how can we know what the counterfactual is, since we observe every unit under treatment after T_0?

                        Imagine an experiment giving everyone in each group the drug at the same time. We couldn't know if the drug worked because everyone was treated, there's nobody or no-thing to compare it to.

                        Comment


                        • #13
                          Jared, I am listening to what you re saying. This is not sothething that I found. There are articles based on this kind of control groups (https://doi.org/10.1016/j.socscimed.2020.113101). As I already mentioned the dependent variable shows seasonality, presenting the same trend in previous years (2015-19).

                          Comment

                          Working...
                          X