Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff in diff with binary outcome!!

    hello everyone,
    here I am again with a question on difference-in-differences.
    I have seen that there are already similar questions on the forum but I have not found an answer that truly helps me.
    I want to perform a diff-in-diff with a dichotomous dependent variable, fertility.

    In addition to this, as implied by the method, I have a time variable with the two years of interest and a treatment variable that divides the two treatment and control groups. Then, I have an interaction variable between time and treatment. My data are cross-sectional. I tried to to this with three different commands, on the basis on the previous questions asked here. Can you suggest me which is the correct one? Can you als help me with the interpretation of the output? Thank you very much.

    1)
    reg fertility i.treated##i.time

    --------------------------------------------------------
    Outcome var. | ferti~y | S. Err. | |t| | P>|t|
    ----------------+---------+---------+---------+---------
    Before | | | |
    Control | 0.192 | | |
    Treated | 0.167 | | |
    Diff (T-C) | -0.024 | 0.004 | -6.04 | 0.000***
    After | | | |
    Control | 0.221 | | |
    Treated | 0.180 | | |
    Diff (T-C) | -0.041 | 0.004 | 11.70 | 0.000***
    | | | |
    Diff-in-Diff | -0.017 | 0.005 | 3.15 | 0.002***
    --------------------------------------------------------

    2)
    logistic fertility i.treated##i.time


    fertility | Odds ratio Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    treated |
    treated | .8472373 .0227018 -6.19 0.000 .8038908 .892921
    |
    time |
    y2014 | 1.196859 .023733 9.06 0.000 1.151235 1.24429
    |
    treated#time |
    treated #|
    y2014 | .9114572 .0319333 -2.65 0.008 .8509696 .9762442
    |
    _cons | .2370407 .0036075 -94.59 0.000 .2300745 .2442177

    3)
    margins i.treated#i.time


    ------------------------------------------------------------------------------------
    | Delta-method
    | Margin std. err. z P>|z| [95% conf. interval]
    -------------------+----------------------------------------------------------------
    treated#time |
    not treated#y2008 | .1916191 .0023574 81.28 0.000 .1869987 .1962396
    not treated#y2014 | .2210043 .0021885 100.98 0.000 .2167149 .2252937
    treated#y2008 | .1672424 .0030715 54.45 0.000 .1612225 .1732624
    treated#y2014 | .1797107 .0027496 65.36 0.000 .1743215 .1850999
    ------------------------------------------------------------------------------------







  • #2
    Your presentation of the results is a little confusing.

    1) is definitely not the output of -reg fertility i.treated##i.time-. It looks more like the output of one of the new DID commands. It does appear, however, to be based on a linear probability model, just like the command you showed for it. The interpretation is just a matter of reading the output. The Diff-in-Diff row of the table is your DID estimate of the effect of treatment, -.017, with standard error 0.005.

    Skipping ahead to 3), it is just a different way of displaying the results in #1. If you look at the before and after rows for the treated and control groups in 1), you will see that the numbers there are, except for rounding to 3 decimal places, the same numbers as in the output of 3), in a different order. Here the DID estimate of the treatment effect is not calculated for you, but what you are looking at is the predicted probability of fertility in each group at each time in your study.

    2) is different as it is based on a logistic regression. The interpretation of this one is a bit more difficult because it is a non-linear model. If you worked out the details of doing that, it would probably lead to results similar to what you got in 1) and 3). Under the circumstances, I would recommend you go with 1) and 3). Linear probability models are only problematic if the predicted probabilities are close to 0 or 1. But the output in 3) shows that they are well bounded away from 0 and 1. So the linear probability model, being easier to work with, is the one I would go with.

    Comment


    • #3
      Based on my recent research and teaching, here is how I would do it. First, linear, then logit. I'm assuming treat is a binary variable indicating whether a unit belongs to the treated group and time is a binary variable set to zero in the first period, one in the second. For the nonlinear model, it really helps to define the time-varying treatment variable, which I call w. This variable should always be zero in the first period and one for the treated units in the second period. The parameter is the average treatment effect on the treated (in the second time period).

      Code:
      gen w = treat*time
      reg y treat time w, vce(robust)
      logit y treat time i.w, vce(robust)
      margins, dydx(w) subpop(if w == 1)
      As Clyde says, you'll probably find similar answers. However, in simulations I have found fairly large differences in more complicated settings between the linear and logit models.

      Comment


      • #4
        Hello. First of all, thank you for replying immediately, even if the tables are confusing.
        I am still figuring out how to upload the tables from stata as recommended in the forum guidelines, but I have a deadline tomorrow so I just copy and paste it. I'm sorry.

        You are aright, the first output is resulting from this command:
        -diff fertility, t(treated) p(time)-.
        Anyway, using the command -reg fertility i.treated##i.time-, the coefficient is the same.

        About the interpretation, I would say then, that the effect of the treatment in the treated group results in a decrease of 0.017 in the outcome variable. Is it correct?

        In terms of methodology, is it correct to operate a difference-in-differences analysis with a binary dependent variable? Thus is violate any assumption, as far as you know?
        I have found different information which often are opposing.

        Thank you so much

        Virginia

        Comment


        • #5
          Hello. First of all, thank you for replying immediately, even if the tables are confusing.
          I am still figuring out how to upload the tables from stata as recommended in the forum guidelines, but I have a deadline tomorrow so I just copy and paste it. I'm sorry.

          You are aright, the first output is resulting from this command:
          -diff fertility, t(treated) p(time)-.
          Anyway, using the command -reg fertility i.treated##i.time-, the coefficient is the same.

          About the interpretation, I would say then, that the effect of the treatment in the treated group results in a decrease of 0.017 in the outcome variable. Is it correct?

          In terms of methodology, is it correct to operate a difference-in-differences analysis with a binary dependent variable? Thus is violate any assumption, as far as you know?
          I have found different information which often are opposing.

          Thank you so much

          Virginia

          Comment


          • #6
            Originally posted by Jeff Wooldridge View Post
            Based on my recent research and teaching, here is how I would do it. First, linear, then logit. I'm assuming treat is a binary variable indicating whether a unit belongs to the treated group and time is a binary variable set to zero in the first period, one in the second. For the nonlinear model, it really helps to define the time-varying treatment variable, which I call w. This variable should always be zero in the first period and one for the treated units in the second period. The parameter is the average treatment effect on the treated (in the second time period).

            Code:
            gen w = treat*time
            reg y treat time w, vce(robust)
            logit y treat time i.w, vce(robust)
            margins, dydx(w) subpop(if w == 1)
            As Clyde says, you'll probably find similar answers. However, in simulations I have found fairly large differences in more complicated settings between the linear and logit models.
            Thank you Jeff. Firstly, I do have the variables that you assume.
            I ran the commands you suggested. This is the table (I apologise again for the presentation):

            ------------------------------------------------------------------------------------------------
            | Delta-method
            | dy/dx std. err. z P>|z| [95% conf. interval]
            -------------+-----------------------------------------------------------------------------------
            ttreated |
            0 | 0 (empty)
            1 | -.0140748 .0053879 -2.61 0.009 -.024635 -.0035146
            ---------------------------------------------------------------------------------------------------

            Is this what you were talking about? How do I interpret it?

            Thank you for you help

            Comment


            • #7
              As expected, the estimated effect is pretty close to the linear model. The linear model estimate is about -.017 and the logit is about -.014. So it is either a 1.7 or 1.4 percentage point drop in P(fertility = 1). The 95% confidence intervals exclude zero in both cases.

              In the future, please show us all the Stata commands and all of the output, in code delimiters (click on the # sign).

              Comment


              • #8
                Thank you very much, this was really hepful!

                Comment

                Working...
                X