Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with replication of TWFE Model

    Hi everybody,

    I am analyzing the effect of a school construction program on education. I am using individual level panel data (5 waves) matched over the birthplace with the school construction data.
    To identify individuals who have been exposed to the program, I am using the variation in year of birth and region of birth. Individuals born between 1968 and 1972 are the treatment group, and cohorts 1958 to 1963 form the control group. I multiply this dummy with the treatment intensity of the school program in each region, calculated as schools built per 1,000 children (youngXnin). I add region of birth and year of birth fixed effects and cluster the standard errors at the region of birth level. Furthermore, I control for the pre-program enrollment rates, number of children and another policy implemented during the same time at the regional level, interacted with the year of birth. I tagged the individuals by their highest years of education (yoe).

    I have run the following regression:

    Code:
    areg yoe youngXnin i.yob i.yob i.yob#c.en71 i.yob#c.ch71 i.yob#c.wsppc female if tag==1, abs(birthpl) cluster(birthpl)
    My result looks like this:

    Code:
     
    note: 1972.yob#c.en71 omitted because of collinearity
    note: 1972.yob#c.ch71 omitted because of collinearity
    note: 1972.yob#c.wsppc omitted because of collinearity
    
    Linear regression, absorbing indicators         Number of obs     =      5,986
    Absorbed variable: birthpl                      No. of categories =        242
                                                    F(  42,    241)   =      35.82
                                                    Prob > F          =     0.0000
                                                    R-squared         =     0.2778
                                                    Adj R-squared     =     0.2420
                                                    Root MSE          =     3.4042
    
                                  (Std. Err. adjusted for 242 clusters in birthpl)
    ------------------------------------------------------------------------------
                 |               Robust
             yoe |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
       youngXnin |   .1817468   .1288603     1.41   0.160    -.0720894     .435583
          female |  -1.008165   .1054618    -9.56   0.000     -1.21591   -.8004209
                 |
             yob |
           1958  |  -1.454403   .6948977    -2.09   0.037    -2.823252   -.0855548
           1959  |   1.006075   .5946716     1.69   0.092    -.1653431    2.177492
           1960  |   1.720255   .6202648     2.77   0.006     .4984226    2.942088
           1961  |   .5529631   .5809999     0.95   0.342    -.5915233    1.697449
           1962  |   1.401703   .5385698     2.60   0.010     .3407979    2.462608
           1968  |   2.022092   .5760126     3.51   0.001     .8874304    3.156755
           1969  |   3.192242   .5136172     6.22   0.000      2.18049    4.203994
           1970  |    3.68983   .5406953     6.82   0.000     2.624738    4.754922
           1971  |   3.338817   .5653397     5.91   0.000     2.225179    4.452455
           1972  |   3.322119   .6062628     5.48   0.000     2.127868     4.51637
                 |
      yob#c.en71 |
           1957  |  -3.188886   3.273782    -0.97   0.331    -9.637765    3.259993
           1958  |   3.326529   2.634466     1.26   0.208    -1.862991    8.516049
           1959  |   .0702168   1.809247     0.04   0.969     -3.49374    3.634174
           1960  |  -2.737319   1.668069    -1.64   0.102    -6.023176    .5485377
           1961  |  -1.312241   2.450678    -0.54   0.593    -6.139723    3.515242
           1962  |   .7871137   1.437824     0.55   0.585    -2.045193    3.619421
           1968  |   1.592497   1.799092     0.89   0.377    -1.951456     5.13645
           1969  |   .3691274   1.573141     0.23   0.815    -2.729733    3.467988
           1970  |   1.954537    1.64173     1.19   0.235    -1.279435    5.188509
           1971  |   .1603102   1.921413     0.08   0.934    -3.624596    3.945217
           1972  |          0  (omitted)
                 |
      yob#c.ch71 |
           1957  |   5.17e-06   2.41e-06     2.15   0.033     4.30e-07    9.92e-06
           1958  |   4.08e-06   2.19e-06     1.86   0.064    -2.43e-07    8.40e-06
           1959  |  -7.02e-07   2.20e-06    -0.32   0.750    -5.04e-06    3.64e-06
           1960  |  -6.26e-07   2.18e-06    -0.29   0.774    -4.91e-06    3.66e-06
           1961  |   2.50e-06   2.22e-06     1.12   0.262    -1.88e-06    6.88e-06
           1962  |  -1.28e-06   2.22e-06    -0.58   0.564    -5.66e-06    3.09e-06
           1968  |   1.53e-06   2.35e-06     0.65   0.516    -3.11e-06    6.17e-06
           1969  |  -8.46e-07   2.15e-06    -0.39   0.695    -5.09e-06    3.40e-06
           1970  |  -1.76e-06   1.95e-06    -0.90   0.369    -5.61e-06    2.09e-06
           1971  |  -8.76e-07   2.00e-06    -0.44   0.662    -4.82e-06    3.06e-06
           1972  |          0  (omitted)
                 |
     yob#c.wsppc |
           1957  |   1.761464   1.080673     1.63   0.104    -.3673056    3.890234
           1958  |   2.369885   .9174623     2.58   0.010      .562616    4.177154
           1959  |   .8455982    .492575     1.72   0.087    -.1247038      1.8159
           1960  |   .0375649   .4819009     0.08   0.938    -.9117106    .9868404
           1961  |   1.813205   .8031514     2.26   0.025     .2311121    3.395298
           1962  |   .4528254   .3856467     1.17   0.241    -.3068432    1.212494
           1968  |   .0566797   .4649344     0.12   0.903    -.8591742    .9725336
           1969  |   .3267577    .306535     1.07   0.288    -.2770721    .9305876
           1970  |  -.6072225   .4483307    -1.35   0.177    -1.490369    .2759245
           1971  |   .5620162   .4839056     1.16   0.247    -.3912082    1.515241
           1972  |          0  (omitted)
                 |
           _cons |   5.588359   .5779029     9.67   0.000     4.449974    6.726745
    ------------------------------------------------------------------------------
    
    .


    A lot of studies analyzed the effect of the program on education, e.g.,
    Duflo, E. (2001). "Schooling and labor market consequences of school construction in Indonesia: Evidence from an unusual policy experiment." American economic review 91(4): 795-813.
    Mazumder, B., et al. (2019). Intergenerational Human Capital Spillovers: Indonesia's School Construction and Its Effects on the Next Generation. AEA Papers and Proceedings.

    They all find significant effects of the program on education. Duflo (2001) uses different data, but Mazumder et al. (2019) get data from the same source.

    Question:
    I am wondering where the differences in the magnitude and the significance of the estimates come from. I checked my data and code several times, but I don't see the problem.
    Also, as soon as I add the fixed effects, the estimate on the treatment looses the significance. This would indicate that there is not enough variation between the cohorts and regions, right?
    Do you have any ideas?

    Any advice is appreciated!

  • #2
    Originally posted by Jennifer Klaus View Post
    but Mazumder et al. (2019) get data from the same source.

    Question:
    I am wondering where the differences in the magnitude and the significance of the estimates come from. I checked my data and code several times, but I don't see the problem.

    Are you using the exact same sample as they do? The first thing that you should do is to try to replicate their results. Differences may arise from different samples, data revisions if considering the same sample period, or errors and omissions in data preparation and model specification. The AER has a policy of having authors deposit their data and codes on their website after publication, so you should look at their analysis. On the other hand, if you are using a different sample, nothing says that the results have to be the same. If your implementation is correct, the difference is in itself a result.

    Comment


    • #3
      I was able to replicate their results with the exact data set they provided. However, I established my own data set as I needed additional information from the survey.
      I cross-checked the number of observations and every variable I am using with their data. Of course there are differences, but all in all it's very similar. That's why I don't understand the difference in significance (1% vs. no significance).
      I am worried about the result because I wanted to use the exposure to the school construction program as an instrument for further analysis.

      Comment


      • #4
        I am not familiar with this literature to comment further. You probably need to consult with someone who is familiar with the dataset and literature on implications of the non-significance.

        Comment


        • #5
          Thank you for your opinion, Andrew.
          Do you think I still might use the interaction as an instrument for education in a 2SLS model, even when the first stage was not significant?

          Comment


          • #6
            Why not go ahead and run the IV regression using xtivreg or ivreghdfe from SSC and look at whether the instrument passes the weak instrumental variable test?

            Comment

            Working...
            X