How do I do a difference-in-difference with matched ID's?

Was Ud

Join Date: Sep 2017

Posts: 8
#1

How do I do a difference-in-difference with matched ID's?

11 Apr 2018, 18:17

Hello,
I am trying to calculate a difference in difference effect of a certain change in different locations. Each location is identified by a ID number.
As far as I know, to calculate a difference-in-difference effect I use:

HTML Code:

reg y Period##Treatment, r

my data is built like this (as an example):
ID Treatment (0=control, 1=treated group) Period (0=before, 1=after) y

1 0 0 2

1 0 1 7

1 1 0 9

1 1 1 8

2 0 0 1

2 0 1 10

2 1 0 4

2 1 1 2

Each ID number represents a district/location. Treatment represents control vs treated group. Period represents a before vs after the year that the change was taken into effect.
What I want to do is a calculate a difference-in-difference but doing the comparisons between each group within each ID number. Is there any way to do this?
I thought maybe to add i.ID as a variable in my regression, but I am not sure if that is correct.
Thanks
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#2

11 Apr 2018, 22:09

Well, what you actually have here is a three level data: observations at different times nested within some level that you have not identified in the data you show, and those in turn are nested in matched pairs that you refer to as districts or locations. You need an analysis that respects this hierarchy.

To do that, you need a variable which identifies that intermediate level in the data. Extrapolating from the patterns I perceive in the data you show, I'm guessing that each id consists of a matched pair of what, for lack of a word, I will call sublocations. One sublocation is in the treatment = 1 group, and the other is in the control group. (If this is not correct, the code below will not be either, so you will need to post back with a fuller explanation of the data.)

So your analysis should look more like this:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(id treatment period y) 1 0 0 2 1 0 1 7 1 1 0 9 1 1 1 8 2 0 0 1 2 0 1 10 2 1 0 4 2 1 1 2 end // IDENTIFY SUBLOCATIONS WITHIN ID-PAIRS egen int sublocation = group(id treatment) mixed y i.treatment##i.period || id: || sublocation: margins treatment#period margins period, dydx(treatment)
1 like
Comment

Was Ud

Join Date: Sep 2017
Posts: 8

03 May 2018, 13:17

Hi Clyde,
Thank you so much. Sorry for the late reply, I never got an email saying that someone responded.
You are correct, Each ID is a "location" that contains a treatment group and a control group, and each contains a before and after.
So you are saying I needed a variable that contains "sublocations", where it is based on the ID ("location") and whether the group in that "location" is treatment or control?
Lastly, I ran the commands and wanted to ask what do the results from the "margins" commands represent. As far as I know, the "mixed" command show the overall effect of the treatment taking the "id" (location) and "sublocation" into consideration.

EDIT: here is the output from the commands

HTML Code:

input byte(id treatment period y)

           id  treatm~t    period         y
  1. 1 0 0  2
  2. 1 0 1  7
  3. 1 1 0  9
  4. 1 1 1  8
  5. 2 0 0  1
  6. 2 0 1 10
  7. 2 1 0  4
  8. 2 1 1  2
  9. end

. egen int sublocation = group(id treatment)

. mixed y i.treatment##i.period || id: || sublocation:

Performing EM optimization:

Performing gradient-based optimization:

Iteration 0:   log likelihood = -15.624051  
Iteration 1:   log likelihood = -15.584372  
Iteration 2:   log likelihood = -15.584208  
Iteration 3:   log likelihood = -15.584208  

Computing standard errors:

Mixed-effects ML regression                     Number of obs     =          8

-------------------------------------------------------------
                |     No. of       Observations per Group
 Group Variable |     Groups    Minimum    Average    Maximum
----------------+--------------------------------------------
             id |          2          4        4.0          4
    sublocation |          4          2        2.0          2
-------------------------------------------------------------

                                                Wald chi2(3)      =      48.38
Log likelihood = -15.584208                     Prob > chi2       =     0.0000

----------------------------------------------------------------------------------
               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
     1.treatment |          5   2.106537     2.37   0.018     .8712624    9.128738
        1.period |          7   1.030776     6.79   0.000     4.979715    9.020285
                 |
treatment#period |
            1 1  |       -8.5   1.457738    -5.83   0.000    -11.35711   -5.642886
                 |
           _cons |        1.5   1.489547     1.01   0.314    -1.419458    4.419458
----------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   6.28e-20          .             .           .
-----------------------------+------------------------------------------------
sublocation: Identity        |
                  var(_cons) |      3.375   2.787563      .6686956     17.0341
-----------------------------+------------------------------------------------
               var(Residual) |     1.0625   .7513009      .2657288     4.24834
------------------------------------------------------------------------------
LR test vs. linear model: chi2(2) = 3.46                  Prob > chi2 = 0.1777

Note: LR test is conservative and provided only for reference.

. margins treatment#period

Adjusted predictions                            Number of obs     =          8

Expression   : Linear prediction, fixed portion, predict()

----------------------------------------------------------------------------------
                 |            Delta-method
                 |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
treatment#period |
            0 0  |        1.5   1.489547     1.01   0.314    -1.419458    4.419458
            0 1  |        8.5   1.489547     5.71   0.000     5.580542    11.41946
            1 0  |        6.5   1.489547     4.36   0.000     3.580542    9.419458
            1 1  |          5   1.489547     3.36   0.001     2.080542    7.919458
----------------------------------------------------------------------------------

. margins period, dydx(treatment)

Conditional marginal effects                    Number of obs     =          8

Expression   : Linear prediction, fixed portion, predict()
dy/dx w.r.t. : 1.treatment

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
0.treatment  |  (base outcome)
-------------+----------------------------------------------------------------
1.treatment  |
      period |
          0  |          5   2.106537     2.37   0.018     .8712624    9.128738
          1  |       -3.5   2.106537    -1.66   0.097    -7.628738    .6287376
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

Last edited by Was Ud; 03 May 2018, 13:25.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#4

03 May 2018, 13:37

Goodness! Is that your real data and results? Had I known you have a sample size of 8, I would never have recommended this approach.

I'm going to assume this is just a simple example and that you really have a more substantial data set.

As for the regression output, the part that is directly interpretable is the coefficient of the treatment#period interaction, -8.5 (95% CI -11.4 to -5.6). This is your difference-in-differences estimator of the effect of the treatment, and to the extent that the necessary conditions for a DID estimator to unbiasedly estimate causal effects, you have an estimate of the causal effect of treatment on y.

The -margins- outputs provides more detail. The first -margins- output shows you the expected values of y in all four combinations of treated/untreated and periods 0 and 1. For example, in the treatment group (treatment = 1) in period 0, the expected value of y is 6.5 (95% CI 3.6 to 9.4). These statistics are usually of interest as background to the causal effect.

The second -margins- output give the difference between the expected values of y in the treated and untreated groups in each period. For example, in period 0, the expected value of y in the treatment group is 5 units greater than in the non-treatment group (95% CI 0.9 to 9.1), whereas in period 1, the expected value of y in the treatment group is 3.5 units less than in the non-treatment group (95% CI -7.6 to 0.6). These are, if you will, the differences, the difference between which constitute the DID estimator.
1 like
Comment
Was Ud

Join Date: Sep 2017

Posts: 8
#5

03 May 2018, 16:43

Thank you for your explanation. This is not my real data or results. This is just an example.
The data I'm working with is much more extensive with over 400 observations.

Thanks again for your help, really appreciate it!
Comment
nishi malhotra

Join Date: Mar 2022

Posts: 2
#6

21 Mar 2022, 17:21

I wanted to ask a query I have a panel of HH(Households) - 9 periods - With 4 Pre treament and 5 post treatment . Thereis a policy change in period 4 . So convenience point is poeriod 4 pre and post ,. Intervention is a binary variable 0 & 1 for the rural loan . this value keeps on changing across period. Outcome is income dependent on this loan . Can I use DiD with loan X Post as interaction
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#7

21 Mar 2022, 19:50

I cannot tell from your description of the problem. You refer to a "policy change" in period 4. You also refer to an "intervention...for the rural loan." Are these the same thing? Or does loan refer to events that perhaps occurred before the policy change, and perhaps something about the loans was affected by the policy? And what does "Outcome is income dependent on this loan" mean? That seems to imply that the outcome is necessarily 0 in the absence of a loan, and, depending on the answer to my first questions, perhaps it is necessarily 0 in the first four periods. Please clarify.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#8

21 Mar 2022, 20:40

nishi malhotra you can use whatever model your heart desires, provided you post back with an example of your real data using dataex or provide a better description of your problem.
Comment

ID	Treatment (0=control, 1=treated group)	Period (0=before, 1=after)	y
1	0	0	2
1	0	1	7
1	1	0	9
1	1	1	8
2	0	0	1
2	0	1	10
2	1	0	4
2	1	1	2

Announcement