Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dynamic Diff-in-Diff: year and individual fixed effects

    Hi Statalist,

    I have an unbalanced panel data, with 800,000 observations and around 50,000 workers, for 27 years (1985-2012).
    I would like to perform a policy evaluation, occurred in 1995, but not with a standard diff-in-diff, but with a dynamic one, since I expect the treatment to have an effect on the entire life-cycle (since 1996 to 2012). Basically, I need to have the POST_TREAT coefficients for all the 27 years; the idea is to construct a model able to detect the difference between the treated and control workers in the specific year k (ranging from 1985 to 2012) relative to the reform year (1995), ideally setting the 1995 coefficient at 0.

    So far, I tried to perform this:

    xtreg Y i.year i.year##treatment, fe (cluster worker)

    where y is the output, i.year is the year k fixed effect, and then i.year##treatment is the interaction term for all the year, with the error terms clustered at indici

    In this case, for the last term, I will have the interaction coefficients for all the years.

    My questions are:
    1) the model seems reasonable?
    2) Is it right to say that, thank to the fixed effects model, I have already controlled for the individual fixed effects?
    3) If I try to perform a OLS, including the individual fixed effects (i.worker), should I get the same results?

    Thank you!

    Nicolò

  • #2
    Yes to all three questions. (In the case of #3 the answer is only yes because you are doing a linear regression.)

    Note: You cannot run -xtreg Y i.year i.year##treatment, fe (cluster worker)-; it's got a syntax error. I assume you mean -xtreg Y i.year i.year##treatment, fe vce(cluster worker)-.

    Comment


    • #3
      Dear Clyde,

      thank you for the answer. Yeah, sure, I mean vce, thanks for the correction.

      However, I tried to perform a OLS with individual fixed effects, something like:
      - reg Y i.year i.worker i.year##treatment, vce(cluster worker)
      but it gives me error r(103), too many variables specified.

      I suppose the error arises because of the creation of the individual dummies i.worker.
      Therefore I tried to set an higher number of variables, but Stata said me that I cannot do that otherwise I would lose my data. I tried to save or to put the option "clear" at the end, but nothing happens.

      How can I do?

      Thank you very much!

      Comment


      • #4
        Well, depending on the size of your data set, the number of workers in it, and the flavor of Stata you are running, it simply may not be possible.

        Moreover, while it will produce the same results as the -xtreg, fe-, it will also generate an enormous amount of output that is mostly useless information, such as the coefficients of each worker fixed effect. So my advice is just skip it. It serves no purpose anyway.

        Comment


        • #5
          Thank you very much Clyde.

          Last thing. What is exactly the difference between these two regressions:
          1. xtreg Y i.year i.year##treatment, fe cluster (worker)
          2. xtreg Y i.year i.year#treatment, fe cluster (worker)

          the results are really different. If I'm right, the interaction in 1. captures the difference in impact on Y between treatment and control and the interaction in 2. captures "simply" the difference in Y between treatment and control. Is that right?

          thank you,
          Nicolò

          Comment


          • #6
            The results are really different because they are very different models. Your interpretation of #1 is correct. #2 is simply a mis-specified model for most circumstances (including yours, to the extent I understand your goals) and your interpretation of it is incorrect.

            The second regression omits the "main" effects of year and treatment. So you have "naked" interaction terms which are, typically, meaningless. But just for the exercise, here's how you would interpret them. You have a large number of years, so let's say that the base year is 2000, and then regression 2 will give you outputs for 2001.year#treatment, 2002.year#treatment, etc. So here's what the coefficient of 2001.year#treatment represents: it is the difference between the expected outcome in the treatment group in 2001 and the expected outcome of all entities (regardless of treatment group) in any year other than 2001 combined with the control group in 2001. That comparison group is obviously quite heterogeneous both with respect to entities and times, so this contrast is hardly ever of interest. I don't see any way it would be of use in your situation.

            Added: I can't emphasize strongly enough how much easier, and less error-prone, it is to interpret the output of -margins- run after the regression, than it is to directly interpret the regression output for an interaction model. The meanings of the terms in the interaction model are not intuitive and people typically have trouble understanding them. Even if you find them quite congenial, whoever your audience for this work is probably won't. The expected outcomes in each group, and the marginal effects of treatment in each time period are the "paydirt," and are what people will readily understand. While you can calculate them from the regression output using -predict- and -lincom-, the process is tedious and the probability of making a mistake is quite high. That's what -margins- is for. Do use it. If you are not familiar with -margins-, I recommend getting started by reading the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It is simpler than the official Stata documentation, and its examples focus on the kind of problem you are working on. Once you've got that under your belt, you can learn about the more advanced features -margins- has on offer from the official PDF documentation that came with your Stata installation.
            Last edited by Clyde Schechter; 10 Dec 2017, 13:46.

            Comment


            • #7
              Hi Clyde, I found really useful the link you posted and then i solved that problem. But, I have another doubt of interpretation.
              What is the difference in coefficient interpretation of these two regressions?

              1) xtreg Y i.year i.year##treatment, fe cluster (worker), with one regression for any specific year t
              2) xtreg Y i.year year_t_dummy##treatment, fe cluster (worker), with one regression for any specific year t AND a dummy variable for the year t (0 not in t, 1 in t).

              I have always supposed that a "typical" diff-in-diff" should have needed a classical interaction between treatment and year dummies (regression 2). Therefore, what is the difference between an interaction with a year fixed effect and a "simple" year dummy?

              Thank you very much!

              Comment


              • #8
                I don't understand your explanation of year_t_dummy, so I can't comment. I also don't understand what you mean by "one regression for any specific year t." Perhaps showing example data (use -dataex-) and code and output (use code delimiters) would make it clearer.

                Comment


                • #9
                  Sorry Clyde for the explanation.

                  Supposing I have a reform in 1995 and I would like to understand the effect in 1998, for example.

                  I can use:
                  1) xtreg Y i.1998 i.1998##treatment, fe cluster (worker)

                  Instead, if I create another two variables: year_1998 (1 if year is 1998 and 0 otherwise) and interaction_1998=(year_1998 * treatment).

                  Then I run:
                  2) xtreg Y i.1998 interaction_1998, fe cluster (worker)

                  I did these two regressions and I got different results. Why? And what are the interpretations of the two coefficients of interaction?

                  Hope this is clear

                  Comment


                  • #10
                    I can use:
                    1) xtreg Y i.1998 i.1998##treatment, fe cluster (worker)
                    No, you can't. i.1998 is a syntax error.

                    What you can run is:

                    Code:
                    xtreg Y 1998.year 1998.year##treatment, fe cluster(worker)
                    You could instead create a separate indicator variable for year 1998 and an interaction term of that with treatment, and then run the code
                    Code:
                    xtreg Y i._year_1998 interaction_1998, fe cluster(worker)
                    The output from -xtreg- will be the same for both of these models. But doing it the first way will enable you to use the -margins- command later to get predicted outcomes and marginal effects, whereas the latter will give you incorrect answers if you try to use -margins- (because -margins- will not know that the variable interaction_1998 is actually the interaction of year_1998 and treatment). So do it the first way I show here.

                    Comment


                    • #11
                      Thank you Clyde, perfectly clear. I will use the regression you indicated.
                      xtreg Y 1998.year 1998.year##treatment, fe cluster(worker) Then, in order to get exactly the marginal effect of the reform on the treatment group, let's suppose, in 1998, I have to use after regression:

                      margins dydx(*) ?

                      My goal is to understand how exactly the reform had in impact on, supposing, the amount of weeks worked (my Y). Then my coefficient of interest should reveal the marginal change in weeks worked for the treatment group with respect to the control group in 1998.
                      But it seems I'm not getting the marginal effect of the interaction terms.


                      dy/dx Std. Err. z P>|z| [95% Conf. Interval]
                      --------------+----------------------------------------------------------------
                      1.year_98 | .0609549 .05562 1.10 0.273 -.0480582 .1699681
                      1.treatment | -.7904914 1.188271 -0.67 0.506 -3.119459 1.538476

                      Comment


                      • #12
                        But it seems I'm not getting the marginal effect of the interaction terms.
                        That's for the same reason you are not getting unicorns. Neither unicorns nor marginal effects of interaction terms exist.

                        n order to get exactly the marginal effect of the reform on the treatment group, let's suppose, in 1998,
                        The output you show gives the average marginal effect of being in year 98 on both groups, and, below it, the average marginal effect of being in the treatment group, both before and after 98. Neither of these is what you want. For that you need a slightly different command:

                        Code:
                        margins treatment, dydx(year_98)
                        This will give you two outputs: the marginal effect of being in year_98 in the treatment group and in the non-treatment group. You can just ignore the non-treatment group row from the output table if it's not also of interest to you.

                        However, none of these is the DID estimator of the treatment effect. For that, you need to look to the regression output, not the -margins- command. The DID estimator of the treatment effect is the regression coefficient of the interaction term.
                        Last edited by Clyde Schechter; 17 Jan 2018, 17:04.

                        Comment

                        Working...
                        X