Difference-in-difference with panel data and continuous treatment

Steffen Mauch

Join Date: Dec 2021
Posts: 37

Difference-in-difference with panel data and continuous treatment

08 Feb 2022, 09:20

Hi everyone,

I'm trying to implement a difference-in-difference (DD) analysis using some panel data. I inserted a sample of my data below. My data contains information on about 2000 microentrepreneurs (id) over 2 time periods (wave). I would like to determine the effect of the lockdown length (due to covid) on the uptake of Mobile Money (MM) (ie financial transactions via the smartphone). I use Stata 15.

Code:

. list id wave pre_post anyMMactivity_conistent q35_covid treatment_2 if id <= 10

      +-------------------------------------------------------+
      | id   wave   pre_post   anyMMa~t   q35_co~d   treatm~2 |
      |-------------------------------------------------------|
   1. |  1      1          1          1          0          1 |
   2. |  1      2          0          1         12          1 |
   3. |  2      1          1          0          0          1 |
   4. |  2      2          0          1         20          1 |
   5. |  3      1          1          1          0          1 |
      |-------------------------------------------------------|
   6. |  3      2          0          1          1          1 |
   7. |  4      1          1          1          0          1 |
   8. |  4      2          0          1          8          1 |
   9. |  5      1          1          0          0          1 |
  10. |  5      2          0          1          1          1 |
      |-------------------------------------------------------|
  11. |  6      1          1          1          0          0 |
  12. |  6      2          0          1          0          0 |
  13. |  7      1          1          1          0          1 |
  14. |  7      2          0          1         20          1 |
  15. |  8      1          1          1          0          1 |
      |-------------------------------------------------------|
  16. |  8      2          0          1         12          1 |
  17. |  9      1          1          0          .          0 |
  18. |  9      2          0          .          .          0 |
  19. | 10      1          1          0          0          1 |
  20. | 10      2          0          1          8          1 |
      +-------------------------------------------------------+

I have a variable called q35_covid which contains information on the length of the lockdown for the respective entrepreneur in weeks. As logically there is only data on the length of the lockdown after covid hit (ie only in wave 2) I let q35_covid be zero for wave 1. I hope that does not cause any problems.

anyMMactivity_conistent measures if an individual uses MM or not and is thus binary.

I created a pre_post variable which is 0 for wave two and one for wave 1.

Furthermore, I created a treatment variable (called treatment_2) which is 0 if zero weeks were spent in lockdown and 1 if >0 weeks were spent in lockdown.

I first set up DD in a pooled OLS way ...

Code:

reg anyMMactivity_conistent i.pre_post##i.treatment_2, r

... which gives me the following output:

Code:

. reg anyMMactivity_conistent i.pre_post##i.treatment_2, r //Robust std. errors

Linear regression                               Number of obs     =      4,152
                                                F(3, 4148)        =     391.99
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2034
                                                Root MSE          =     .41437

-------------------------------------------------------------------------------
              |               Robust
anyMMactivi~t |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
   1.pre_post |  -.3872571   .0235074   -16.47   0.000    -.4333441   -.3411701
1.treatment_2 |   .0714418   .0165586     4.31   0.000     .0389781    .1039054
              |
     pre_post#|
  treatment_2 |
         1 1  |  -.0368784   .0279018    -1.32   0.186    -.0915808     .017824
              |
        _cons |   .8530466   .0149958    56.89   0.000     .8236469    .8824463
-------------------------------------------------------------------------------

I have the following problems. First, the coefficients appear non-sensical to me which makes me wonder if I interpret them correctly. Secondly, as the term of interest appears to be insignificant I thought that my design might be flawed. Is it a problem that my outcome variable is binary? Should I use a probit or a logit model? Can a DD design be implemented in probit or logit using Stata?

My third question is regarding the treatment variable. I wonder how it might be possible to exploit the heterogeneity of q35_covid. Is there a DD design with continous treatment in Stata? I tried the following ...

Code:

reg anyMMactivity_conistent i.pre_post##c.q35_covid, r

which gives me the following result:

Code:

. reg anyMMactivity_conistent i.pre_post##c.q35_covid, r //Robust std. errors    
>          
note: 1.pre_post#c.q35_covid omitted because of collinearity

Linear regression                               Number of obs     =      3,950
                                                F(2, 3947)        =     490.31
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1959
                                                Root MSE          =     .41019

------------------------------------------------------------------------------
             |               Robust
anyMMactiv~t |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
  1.pre_post |  -.3870294   .0152382   -25.40   0.000     -.416905   -.3571539
   q35_covid |   .0021529   .0008803     2.45   0.015      .000427    .0038787
             |
    pre_post#|
 c.q35_covid |
          1  |          0  (omitted)
             |
       _cons |   .8872826   .0102726    86.37   0.000     .8671426    .9074226
------------------------------------------------------------------------------

Why is interaction term omitted here?

Looking forward to your answers.

Tags: continuous treatment, difference-in-difference, panel data, Pooled OLS

Steffen Mauch

Join Date: Dec 2021

Posts: 37
#2

15 Feb 2022, 11:54

For some of my questions I already found answers.

1) Interpretation of coefficients make sense but xtreg should be used when performing a DD analysis with panel data (https://www.statalist.org/forums/for...s-reg-vs-xtreg)

2) Regarding my binary outcome, a xtlogit can be applied if the predicted outcomes stay within the unit interval (https://www.statalist.org/forums/for...iff-with-xtreg)

That only leaves my third question unanswered: Is it possible to take advantage of the heterogeneity in q35_covid, ie my underlying treatment variables? Or put differently: How can a DD analysis be set up using a continuous treatment which takes into account different treatment intensities?
Comment

Announcement

Difference-in-difference with panel data and continuous treatment

Comment