Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • DiD with a lagged variable using xtreg and regress.

    Hello everyone,

    I am estimating a DiD (difference in difference) with a lagged dependent variable (t-1) using xtreg and regress.
    My data set contains 8,232 students in a panel data format with T=5 (waves). For each student, I have the test scores (Zprofic_mat) and a list of observed variables ($controlvar) over the time period. Then, I create the lagged dependent variable for t-1.

    generate ZMat_L1 = L1.Zprofic_mat
    During the time period (2003-2008), a policy change is implemented in state schools in year 2007. Then, students from state schools are my treatment group and students from municipal schools are the control group. My DiD is 1 if student is enrolled in state schools (treated) in post-treatment period (time).

    When I estimate the model using regress, the results look good. Note that (as expected) the lagged variable (ZMat_L1) is positive indicating that there is a strong correlation between test scores across time.

    PHP Code:
    reg Zprofic_mat ZMat_L1 DiD time treated

          Source 
    |       SS       df       MS              Number of obs =   12103
    -------------+------------------------------           F(  412098) = 4245.11
           Model 
    |  6335.16585     4  1583.79146           Prob F      =  0.0000
        Residual 
    |  4513.59566 12098  .373086101           R-squared     =  0.5840
    -------------+------------------------------           Adj R-squared =  0.5838
           Total 
    |  10848.7615 12102  .896443687           Root MSE      =  .61081

    ------------------------------------------------------------------------------
     
    Zprofic_mat |      Coef.   StdErr.      t    P>|t|     [95ConfInterval]
    -------------+----------------------------------------------------------------
         
    ZMat_L1 |   .7728753   .0061332   126.01   0.000     .7608532    .7848974
             DiD 
    |  -.0511167   .0255607    -2.00   0.046    -.1012198   -.0010137
            time 
    |   .0949971   .0176783     5.37   0.000     .0603447    .1296494
         treated 
    |   .1167293   .0131256     8.89   0.000     .0910011    .1424576
           _cons 
    |  -.0654067   .0097313    -6.72   0.000    -.0844815   -.0463318
    ------------------------------------------------------------------------------ 

    But when I estimate the same model using xtreg, the ZMat_L1 decreases and becomes negative.

    PHP Code:
    xtreg Zprofic_mat ZMat_L1 DiD time treatedfe

    Fixed
    -effects (withinregression               Number of obs      =     12103
    Group variable
    IDaluno                         Number of groups   =      4881

    R
    -sq:  within  0.0231                         Obs per groupmin =         1
           between 
    0.1807                                        avg =       2.5
           overall 
    0.0823                                        max =         4

                                                    F
    (4,7218)          =     42.71
    corr
    (u_iXb)  = -0.3905                        Prob F           =    0.0000

    ------------------------------------------------------------------------------
     
    Zprofic_mat |      Coef.   StdErr.      t    P>|t|     [95ConfInterval]
    -------------+----------------------------------------------------------------
         
    ZMat_L1 |  -.0408561   .0111915    -3.65   0.000    -.0627948   -.0189175
             DiD 
    |   .0807754   .0234132     3.45   0.001     .0348787    .1266721
            time 
    |   .1067221   .0160304     6.66   0.000     .0752977    .1381465
         treated 
    |   .0472461   .1129775     0.42   0.676    -.1742229     .268715
           _cons 
    |   -.184291    .058919    -3.13   0.002    -.2997895   -.0687925
    -------------+----------------------------------------------------------------
         
    sigma_u |   .9238459
         sigma_e 
    |  .47841538
             rho 
    |  .78853743   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    F test that all u_i=0:     F(48807218) =     2.56          Prob 0.0000 


    To be honest, I do not understand the reason for this change in the lagged dependent variable between xtreg and regress. Can please anyone helps me with the interpretation?

    PS: Please note that the model above is only a reduced form for viewing purposes. In the "real" estimation, I will include the control variables, school and time fixed effects, and cluster the standard errors at class level. For this reason I would prefer to apply the xtreg for the estimation.

    Any advice would be highly appreciated!
    Thanks in advance.
    Last edited by Tharcisio Leone; 29 Dec 2021, 05:34.

  • #2
    Your bigger problem is the potential bias from the lagged dependent variable (LDV) given that you have only 5 time periods. You cannot easily distinguish between fixed effects and lagged dependence in this model - an individual can have a high value of the outcome either because she has a high fixed effect or because her values of the outcome were high in the past. With only a few time series observations, it is difficult to isolate these two effects. The usual approach in the literature is to treat the LDV as jointly determined with the outcome and proceed with an instrumental variables (IV)-type estimation. For this, see

    Code:
    help xtabond
    as a starting point.
    Last edited by Andrew Musau; 29 Dec 2021, 09:13.

    Comment


    • #3
      Dear Andrew,

      first of all, thank you very much for your message.
      But that must have been a misunderstanding. To the best of my knowledge, my model is not suffering of Nickel bias because I do not have individual fixed effects (only school and time FE). For this reason, I do not have to apply first-difference transformation (GMM) in order to remove the constant terms and the individual fixed effects from the model.

      The xtreg would work fine here. My problem is only with the interpretation of the ZMat_L1.

      Comment


      • #4
        There is no surprise or paradox here. -xtreg, fe- estimates within panel effects only. -regress- instead implicitly constrains the within- and between-panel effects to be the same and estimates this common effect. The results you are getting constitute evidence that the implicit assumption that within- and between- effects are identical is incorrect. So, you need to decide whether you are interested in the between school or within-school effects of these variables and choose your model accordingly. Or, you can determine that you need to estimate both and run a hybrid model. (-xthybrid- from SSC will do this).

        Comment


        • #5
          Originally posted by Tharcisio Leone View Post
          Dear Andrew,

          first of all, thank you very much for your message.
          But that must have been a misunderstanding. To the best of my knowledge, my model is not suffering of Nickel bias because I do not have individual fixed effects (only school and time FE). For this reason, I do not have to apply first-difference transformation (GMM) in order to remove the constant terms and the individual fixed effects from the model.

          The xtreg would work fine here. My problem is only with the interpretation of the ZMat_L1.
          In #1, you state that you have a panel of students.

          My data set contains 8,232 students in a panel data format with T=5 (waves). For each student, I have the test scores (Zprofic_mat) and a list of observed variables ($controlvar) over the time period. Then, I create the lagged dependent variable for t-1.
          If this is the case, and "IDaluno" identifies a student, i.e., in your xtset command you had

          Code:
          xtset  IDaluno year
          then your xtreg command includes individual (student) fixed effects. Or does "IDaluno" identify a school? Granted that you do not want to exploit the panel structure of your data, you need to consider the consequence of not controlling for unobserved time invariant student effects. If your analysis is descriptive and not causal, that's fine.
          Last edited by Andrew Musau; 29 Dec 2021, 12:05.

          Comment


          • #6
            Many thanks for your support!

            1. Individual fixed effects
            @Andrew: Yes, you are right. The "IDaluno" identifies the students in the model and should really be applied. (Sorry, I have overlooked this).
            There are some papers published on top-tier journals where FE and LDV were estimated together without IVs. (See e.g. Brittona and Propper 2016, specially equation 1 and table 2). In contrast, I found no study in the context of teacher bonus using GMM models for the estimation. What is your interpretation to this fact? Maybe this bias is not a big issue for the journals?

            2. Surprise or paradox with the results
            For the purpose of clarification, I estimated the results using OLS, FE and GMM.
            PHP Code:
            reg Zprofic_mat DiD time treated
            eststo reg
            reg Zprofic_mat ZMat_L1 DiD time treated
            eststo regLag
            xtreg Zprofic_mat DiD time treated
            fe
            eststo xtreg
            xtreg Zprofic_mat ZMat_L1 DiD time treated
            fe
            eststo xtregLag
            xtabond L
            (0/1).Zprofic_mat DiD time treated
            eststo xtabond

            esttab reg regLag xtreg xtregLag xtabondkeep(DiD time treated ZMat_L1stats(N r2cells(b(star fmt(3)) se(par fmt(3)))

            --------------------------------------------------------------------------------------------
                             (
            reg)           (regLag)        (xtreg)        (xtregLag)     (xtabond)   
                          
            Zprofic_mat     Zprofic_mat     Zprofic_mat     Zprofic_mat     Zprofic_mat   
                                 b
            /se            b/se            b/se            b/se            b/se   
            --------------------------------------------------------------------------------------------
            DiD                -0.255***       -0.051*          0.087***        0.081***        0.030   
                              
            (0.033)         (0.026)         (0.021)         (0.023)         (0.024)   
            time                0.148***        0.095***        0.112***        0.107***        0.119***
                              (
            0.023)         (0.018)         (0.014)         (0.016)         (0.016)   
            treated             0.381***        0.117***        0.115           0.047          -0.080   
                              
            (0.015)         (0.013)         (0.083)         (0.113)         (0.121)   
            ZMat_L1                             0.773***                       -0.041***       -0.356***        
                                              (
            0.006)                         (0.011)         (0.011)
            --------------------------------------------------------------------------------------------
            N               19520.000       12103.000       19520.000       12103.000        7040.000   
            r2                  0.034           0.584           0.018           0.023                   
            -------------------------------------------------------------------------------------------- 

            Note that with GMM the ZMat_L1 remains negative. Then, how @Clyde highlighted, I need to decide what model should be used.
            My main interest is to present the results free of bias but here I am not sure what would be this model. I am really surprised to see these negative values for the LDV, specially because the correlation between the test scores over time is high and positive (see below).
            PHP Code:
            correlate Zprofic_mat ZMat_L1
            (obs=14702)

                         | 
            Zprofi~t  ZMat_L1
            -------------+------------------
             
            Zprofic_mat |   1.0000
                 ZMat_L1 
            |   0.7949   1.0000 

            I feel like I am missing something. Can anyone help me with this issue?

            Comment


            • #7
              There are some papers published on top-tier journals where FE and LDV were estimated together without IVs. (See e.g. Brittona and Propper 2016, specially equation 1 and table 2). In contrast, I found no study in the context of teacher bonus using GMM models for the estimation. What is your interpretation to this fact? Maybe this bias is not a big issue for the journals?
              What are the sample sizes? The LDV bias is of order \(\frac{1}{T}\). If \(T\) is sufficiently large, it can be ignored and the fixed effects model can be used.


              My main interest is to present the results free of bias but here I am not sure what would be this model.
              I do not know the underlying theory, but inclusion of a lagged dependent variable on the right-hand side implies that you believe that there exists a dynamic relationship where past values of your outcome influence current values. If the theory suggests such a relationship, you should focus on the GMM results as the other specifications will result in biased estimates. Then, you can perform diagnostics on this.


              I am really surprised to see these negative values for the LDV, specially because the correlation between the test scores over time is high and positive (see below).

              . correlate Zprofic_mat ZMat_L1
              (obs=14702)

              | Zprofi~t ZMat_L1
              -------------+------------------
              Zprofic_mat | 1.0000
              ZMat_L1 | 0.7949 1.0000
              You cannot simply look at the bivariate correlation and expect that the effect will be the same once you control for a whole host of things. There are omitted variables which once you include can change the magnitude and direction of the effect.
              Last edited by Andrew Musau; 30 Dec 2021, 10:23.

              Comment


              • #8
                1. Sample size and T
                In Brittona and Propper 2016 the sample size and T are lower than in my study (N=6,000 and T=2).

                2. Bivariate correlation
                I did not include the control variables in the output above. But once included the magnitude changed but not the direction of the LDV.

                3. Theory
                The underlying theory says that the value-added strategy is necessary. One important peculiarity of the educational production function is its cumulative character over time. The student achievement at time t depends not only on the educational inputs applied during t, but also the sum of all inputs that have already been integrated into the student learning process plus the initial ability. Therefore, the student's achievement in time t-1 strongly influences the outcome in t.

                My trade-off here is to decide between a "biased" FE model that has been published on top-tier journals and a GMM that theoretically presents a solution for the Nickel bias but has not been applied in impact evaluation research.
                In addition, it is still a puzzle to me that the LDV becomes negative by FE and GMM. Specially because the correlation between the test scores over time is high and positive.

                Comment


                • #9
                  Originally posted by Tharcisio Leone View Post
                  1. Sample size and T
                  In Brittona and Propper 2016 the sample size and T are lower than in my study (N=6,000 and T=2).
                  Now that is impossible. You lose a cross-section through lagging, so if \(T=2\), you are left with only 1 period observation and you need a minimum of 2 to run a fixed effects model. I am sure that you have this wrong. You can experiment yourself with the Grunfeld dataset as below:

                  Code:
                  webuse grunfeld, clear
                  keep if time <=2
                  xtset company year
                  xtreg invest L.invest mvalue, fe
                  Res.:

                  Code:
                  . xtreg invest L.invest mvalue, fe
                  note: L.invest omitted because of collinearity
                  note: mvalue omitted because of collinearity
                  
                  Fixed-effects (within) regression               Number of obs     =         10
                  Group variable: company                         Number of groups  =         10
                  
                  R-sq:                                           Obs per group:
                       within  =      .                                         min =          1
                       between =      .                                         avg =        1.0
                       overall =      .                                         max =          1
                  
                                                                  F(0,0)            =       0.00
                  corr(u_i, Xb)  =      .                         Prob > F          =          .
                  
                  ------------------------------------------------------------------------------
                        invest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  -------------+----------------------------------------------------------------
                        invest |
                           L1. |          0  (omitted)
                               |
                        mvalue |          0  (omitted)
                         _cons |    101.607          .        .       .            .           .
                  -------------+----------------------------------------------------------------
                       sigma_u |  144.84996
                       sigma_e |          .
                           rho |          .   (fraction of variance due to u_i)
                  ------------------------------------------------------------------------------
                  F test that all u_i=0: F(9, 0) = .                           Prob > F =      .
                  
                  .
                  As a general comment, do not take as the Gospel whatever you see published in academic journals. The Journal of Public Economics (JPE) is a top field journal in the area of public economics, but its review process is not as thorough as what you will see in the top 5 economic journals. The issue is that sometimes you will find referees that are experts in the topic that an article addresses, but are not very statistically minded. Therefore, there is a large variance in the quality of the methodologies employed. We can agree that it is not the case that \(T=2\) in this instance, but say if \(T=3\), then this study is a good candidate for replication if you can get access to the data. I do a few of these myself, and here is one based on a paper published in, er, JPE.

                  2. Bivariate correlation
                  I did not include the control variables in the output above. But once included the magnitude changed but not the direction of the LDV.
                  This is to be expected. You just need to find a story to explain it.

                  3. Theory
                  The underlying theory says that the value-added strategy is necessary. One important peculiarity of the educational production function is its cumulative character over time. The student achievement at time t depends not only on the educational inputs applied during t, but also the sum of all inputs that have already been integrated into the student learning process plus the initial ability. Therefore, the student's achievement in time t-1 strongly influences the outcome in t.

                  My trade-off here is to decide between a "biased" FE model that has been published on top-tier journals and a GMM that theoretically presents a solution for the Nickel bias but has not been applied in impact evaluation research.
                  Estimate both the FE model and the dynamic model (using GMM) and present the results side-by-side in a table. Then comment on the FE results and argue that the dynamic model results should be preferred due to the LDV bias, which will be apparent in the differences between the coefficient estimates. At the end of the day, you want to be able to defend whatever you do, and no one will criticize you for using a better model, arguing in favor of an inferior one.

                  In addition, it is still a puzzle to me that the LDV becomes negative by FE and GMM. Specially because the correlation between the test scores over time is high and positive.
                  Back to the second comment, this is what you will need to explain once you are comfortable with the diagnostics of the dynamic model. Plainly, this will result from the inclusion of the fixed effects and your other control variables.
                  Last edited by Andrew Musau; 30 Dec 2021, 17:21.

                  Comment


                  • #10
                    I sincerely appreciate all your valuable comments. They were a great help. Thanks again !!

                    Now that is impossible. You lose a cross-section through lagging, so if T=2, you are left with only 1 period observation and you need a minimum of 2 to run a fixed effects model. I am sure that you have this wrong. You can experiment yourself with the Grunfeld dataset as below:
                    Please let me clarify the empirical model in Brittona and Propper 2016.
                    In this study the dependent variable is the test score at school leaving age (Key Stage 4) and the LDV is the exam score at entry into the school at age 11 (Key Stage 2).
                    For this reason, I meant T=2. But the model has of course a minimum of 2 time periods.

                    This is an empirical model that I could replicate in my study as well. Stead of using T=5, I can include only the first test score (at entry into the school) as explanatory variable.

                    Comment


                    • #11
                      It's not really DID if you have a lagged dependent variable. With T = 2, DID with panel data is the same as regressing D.Y on a constant and DiD. Your first regression is the same as D.Y on L.Y DiD. So one controls for lagged Y, the other doesn't. I discuss in my MIT Press book how these are based on different assumptions about the policy assignment.

                      Like Andrew, I'm puzzled how you can implement GMM when you are using T = 2. Your output suggests T as high has four for some students. In any case, with such a small average T you should not use usual fixed effects for the model with lagged Y. Also, I would recommend, if you really have up to four time periods, putting in i.time rather than just time itself.

                      A traditional DiD does not have lagged Y in the equation, and you have to reinterpret the treatment effect that you're identifying.

                      How many years after the intervention do you have?

                      Comment


                      • #12
                        Many thanks for your support.

                        How many years after the intervention do you have?
                        Only 1 year (2008) for the post-treatment period.

                        Like Andrew, I'm puzzled how you can implement GMM when you are using T = 2.
                        Just to clarify.
                        In #1, I used a GMM with T=5.
                        In #10, I would use OLS and FE with T=2 only (as in Brittona and Propper 2016).

                        I would recommend, if you really have up to four time periods, putting in i.time rather than just time itself.
                        What do you mean exactly?
                        PHP Code:
                        regress Zprofic_mat ZMat_L1 DiD i.time treated ??? 
                        In this case, I am not able to control the model for individual and school fixed effects, what is particularly unique for an education production function.

                        Comment


                        • #13
                          Dear Clyde Schechter,
                          could you please indicate me some literature in which I can find more information about the within- and between-panel effects in a context with lagged dependent variable (see your comment in #4).
                          My own search for this theoretical background was unsuccessful.

                          Originally posted by Clyde Schechter View Post
                          There is no surprise or paradox here. -xtreg, fe- estimates within panel effects only. -regress- instead implicitly constrains the within- and between-panel effects to be the same and estimates this common effect. The results you are getting constitute evidence that the implicit assumption that within- and between- effects are identical is incorrect. So, you need to decide whether you are interested in the between school or within-school effects of these variables and choose your model accordingly. Or, you can determine that you need to estimate both and run a hybrid model. (-xthybrid- from SSC will do this).

                          Comment


                          • #14
                            This paper might help.

                            EDIT: Clyde mentioned the hybrid estimator. You might also look up something called "Mundlak" models which has become a little more popular in recent years

                            Comment


                            • #15
                              Originally posted by Jeff Wooldridge View Post
                              It's not really DID if you have a lagged dependent variable. With T = 2, DID with panel data is the same as regressing D.Y on a constant and DiD. Your first regression is the same as D.Y on L.Y DiD. So one controls for lagged Y, the other doesn't. I discuss in my MIT Press book how these are based on different assumptions about the policy assignment.

                              Like Andrew, I'm puzzled how you can implement GMM when you are using T = 2. Your output suggests T as high has four for some students. In any case, with such a small average T you should not use usual fixed effects for the model with lagged Y. Also, I would recommend, if you really have up to four time periods, putting in i.time rather than just time itself.

                              A traditional DiD does not have lagged Y in the equation, and you have to reinterpret the treatment effect that you're identifying.

                              How many years after the intervention do you have?
                              Hello, sweetheart. Actually, I'm working on a similar topic right now, estimating a difference in difference model with a lagged dependant variable as the control variable. After the policy went into effect, I had two years of data. The problem is figuring out how to interpret the DID term's coefficient. Is the LDV going to pollute it? And how do you deal with the LDV's endogeneity problem? Perhaps the second lagged dependent variable can be used as an instrument. However, this will result in a significant loss of observations. Because I only have a total of seven years. And the policy will go into effect at the end of 2012.By the way, how can he find the coefficient of a time-invariant variable like variable of treat while using xtreg with fe? Because the within estimator cannot indentify the coefficient of time-invariant variable.





                              Comment

                              Working...
                              X