Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I do a difference-in-difference with matched ID's?

    Hello,
    I am trying to calculate a difference in difference effect of a certain change in different locations. Each location is identified by a ID number.
    As far as I know, to calculate a difference-in-difference effect I use:
    HTML Code:
    reg y Period##Treatment, r
    my data is built like this (as an example):
    ID Treatment (0=control, 1=treated group) Period (0=before, 1=after) y
    1 0 0 2
    1 0 1 7
    1 1 0 9
    1 1 1 8
    2 0 0 1
    2 0 1 10
    2 1 0 4
    2 1 1 2
    Each ID number represents a district/location. Treatment represents control vs treated group. Period represents a before vs after the year that the change was taken into effect.
    What I want to do is a calculate a difference-in-difference but doing the comparisons between each group within each ID number. Is there any way to do this?
    I thought maybe to add i.ID as a variable in my regression, but I am not sure if that is correct.
    Thanks

  • #2
    Well, what you actually have here is a three level data: observations at different times nested within some level that you have not identified in the data you show, and those in turn are nested in matched pairs that you refer to as districts or locations. You need an analysis that respects this hierarchy.

    To do that, you need a variable which identifies that intermediate level in the data. Extrapolating from the patterns I perceive in the data you show, I'm guessing that each id consists of a matched pair of what, for lack of a word, I will call sublocations. One sublocation is in the treatment = 1 group, and the other is in the control group. (If this is not correct, the code below will not be either, so you will need to post back with a fuller explanation of the data.)

    So your analysis should look more like this:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id treatment period y)
    1 0 0  2
    1 0 1  7
    1 1 0  9
    1 1 1  8
    2 0 0  1
    2 0 1 10
    2 1 0  4
    2 1 1  2
    end
    
    //    IDENTIFY SUBLOCATIONS WITHIN ID-PAIRS
    egen int sublocation = group(id treatment)
    
    mixed y i.treatment##i.period || id: || sublocation:
    
    margins treatment#period
    margins period, dydx(treatment)

    Comment


    • #3
      Hi Clyde,
      Thank you so much. Sorry for the late reply, I never got an email saying that someone responded.
      You are correct, Each ID is a "location" that contains a treatment group and a control group, and each contains a before and after.
      So you are saying I needed a variable that contains "sublocations", where it is based on the ID ("location") and whether the group in that "location" is treatment or control?
      Lastly, I ran the commands and wanted to ask what do the results from the "margins" commands represent. As far as I know, the "mixed" command show the overall effect of the treatment taking the "id" (location) and "sublocation" into consideration.

      EDIT: here is the output from the commands

      HTML Code:
      input byte(id treatment period y)
      
                 id  treatm~t    period         y
        1. 1 0 0  2
        2. 1 0 1  7
        3. 1 1 0  9
        4. 1 1 1  8
        5. 2 0 0  1
        6. 2 0 1 10
        7. 2 1 0  4
        8. 2 1 1  2
        9. end
      
      . egen int sublocation = group(id treatment)
      
      . mixed y i.treatment##i.period || id: || sublocation:
      
      Performing EM optimization:
      
      Performing gradient-based optimization:
      
      Iteration 0:   log likelihood = -15.624051  
      Iteration 1:   log likelihood = -15.584372  
      Iteration 2:   log likelihood = -15.584208  
      Iteration 3:   log likelihood = -15.584208  
      
      Computing standard errors:
      
      Mixed-effects ML regression                     Number of obs     =          8
      
      -------------------------------------------------------------
                      |     No. of       Observations per Group
       Group Variable |     Groups    Minimum    Average    Maximum
      ----------------+--------------------------------------------
                   id |          2          4        4.0          4
          sublocation |          4          2        2.0          2
      -------------------------------------------------------------
      
                                                      Wald chi2(3)      =      48.38
      Log likelihood = -15.584208                     Prob > chi2       =     0.0000
      
      ----------------------------------------------------------------------------------
                     y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
           1.treatment |          5   2.106537     2.37   0.018     .8712624    9.128738
              1.period |          7   1.030776     6.79   0.000     4.979715    9.020285
                       |
      treatment#period |
                  1 1  |       -8.5   1.457738    -5.83   0.000    -11.35711   -5.642886
                       |
                 _cons |        1.5   1.489547     1.01   0.314    -1.419458    4.419458
      ----------------------------------------------------------------------------------
      
      ------------------------------------------------------------------------------
        Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
      -----------------------------+------------------------------------------------
      id: Identity                 |
                        var(_cons) |   6.28e-20          .             .           .
      -----------------------------+------------------------------------------------
      sublocation: Identity        |
                        var(_cons) |      3.375   2.787563      .6686956     17.0341
      -----------------------------+------------------------------------------------
                     var(Residual) |     1.0625   .7513009      .2657288     4.24834
      ------------------------------------------------------------------------------
      LR test vs. linear model: chi2(2) = 3.46                  Prob > chi2 = 0.1777
      
      Note: LR test is conservative and provided only for reference.
      
      . margins treatment#period
      
      Adjusted predictions                            Number of obs     =          8
      
      Expression   : Linear prediction, fixed portion, predict()
      
      ----------------------------------------------------------------------------------
                       |            Delta-method
                       |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
      treatment#period |
                  0 0  |        1.5   1.489547     1.01   0.314    -1.419458    4.419458
                  0 1  |        8.5   1.489547     5.71   0.000     5.580542    11.41946
                  1 0  |        6.5   1.489547     4.36   0.000     3.580542    9.419458
                  1 1  |          5   1.489547     3.36   0.001     2.080542    7.919458
      ----------------------------------------------------------------------------------
      
      . margins period, dydx(treatment)
      
      Conditional marginal effects                    Number of obs     =          8
      
      Expression   : Linear prediction, fixed portion, predict()
      dy/dx w.r.t. : 1.treatment
      
      ------------------------------------------------------------------------------
                   |            Delta-method
                   |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      0.treatment  |  (base outcome)
      -------------+----------------------------------------------------------------
      1.treatment  |
            period |
                0  |          5   2.106537     2.37   0.018     .8712624    9.128738
                1  |       -3.5   2.106537    -1.66   0.097    -7.628738    .6287376
      ------------------------------------------------------------------------------
      Note: dy/dx for factor levels is the discrete change from the base level.
      Last edited by Was Ud; 03 May 2018, 13:25.

      Comment


      • #4
        Goodness! Is that your real data and results? Had I known you have a sample size of 8, I would never have recommended this approach.

        I'm going to assume this is just a simple example and that you really have a more substantial data set.

        As for the regression output, the part that is directly interpretable is the coefficient of the treatment#period interaction, -8.5 (95% CI -11.4 to -5.6). This is your difference-in-differences estimator of the effect of the treatment, and to the extent that the necessary conditions for a DID estimator to unbiasedly estimate causal effects, you have an estimate of the causal effect of treatment on y.

        The -margins- outputs provides more detail. The first -margins- output shows you the expected values of y in all four combinations of treated/untreated and periods 0 and 1. For example, in the treatment group (treatment = 1) in period 0, the expected value of y is 6.5 (95% CI 3.6 to 9.4). These statistics are usually of interest as background to the causal effect.

        The second -margins- output give the difference between the expected values of y in the treated and untreated groups in each period. For example, in period 0, the expected value of y in the treatment group is 5 units greater than in the non-treatment group (95% CI 0.9 to 9.1), whereas in period 1, the expected value of y in the treatment group is 3.5 units less than in the non-treatment group (95% CI -7.6 to 0.6). These are, if you will, the differences, the difference between which constitute the DID estimator.

        Comment


        • #5
          Thank you for your explanation. This is not my real data or results. This is just an example.
          The data I'm working with is much more extensive with over 400 observations.

          Thanks again for your help, really appreciate it!

          Comment


          • #6
            I wanted to ask a query I have a panel of HH(Households) - 9 periods - With 4 Pre treament and 5 post treatment . Thereis a policy change in period 4 . So convenience point is poeriod 4 pre and post ,. Intervention is a binary variable 0 & 1 for the rural loan . this value keeps on changing across period. Outcome is income dependent on this loan . Can I use DiD with loan X Post as interaction

            Comment


            • #7
              I cannot tell from your description of the problem. You refer to a "policy change" in period 4. You also refer to an "intervention...for the rural loan." Are these the same thing? Or does loan refer to events that perhaps occurred before the policy change, and perhaps something about the loans was affected by the policy? And what does "Outcome is income dependent on this loan" mean? That seems to imply that the outcome is necessarily 0 in the absence of a loan, and, depending on the answer to my first questions, perhaps it is necessarily 0 in the first four periods. Please clarify.

              Comment


              • #8
                nishi malhotra you can use whatever model your heart desires, provided you post back with an example of your real data using dataex or provide a better description of your problem.

                Comment

                Working...
                X