Problem with replication of TWFE Model

Jennifer Klaus

Join Date: Sep 2021
Posts: 7

Problem with replication of TWFE Model

02 Feb 2022, 03:47

Hi everybody,

I am analyzing the effect of a school construction program on education. I am using individual level panel data (5 waves) matched over the birthplace with the school construction data.
To identify individuals who have been exposed to the program, I am using the variation in year of birth and region of birth. Individuals born between 1968 and 1972 are the treatment group, and cohorts 1958 to 1963 form the control group. I multiply this dummy with the treatment intensity of the school program in each region, calculated as schools built per 1,000 children (youngXnin). I add region of birth and year of birth fixed effects and cluster the standard errors at the region of birth level. Furthermore, I control for the pre-program enrollment rates, number of children and another policy implemented during the same time at the regional level, interacted with the year of birth. I tagged the individuals by their highest years of education (yoe).

I have run the following regression:

Code:

areg yoe youngXnin i.yob i.yob i.yob#c.en71 i.yob#c.ch71 i.yob#c.wsppc female if tag==1, abs(birthpl) cluster(birthpl)

My result looks like this:

Code:

 
note: 1972.yob#c.en71 omitted because of collinearity
note: 1972.yob#c.ch71 omitted because of collinearity
note: 1972.yob#c.wsppc omitted because of collinearity

Linear regression, absorbing indicators         Number of obs     =      5,986
Absorbed variable: birthpl                      No. of categories =        242
                                                F(  42,    241)   =      35.82
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2778
                                                Adj R-squared     =     0.2420
                                                Root MSE          =     3.4042

                              (Std. Err. adjusted for 242 clusters in birthpl)
------------------------------------------------------------------------------
             |               Robust
         yoe |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   youngXnin |   .1817468   .1288603     1.41   0.160    -.0720894     .435583
      female |  -1.008165   .1054618    -9.56   0.000     -1.21591   -.8004209
             |
         yob |
       1958  |  -1.454403   .6948977    -2.09   0.037    -2.823252   -.0855548
       1959  |   1.006075   .5946716     1.69   0.092    -.1653431    2.177492
       1960  |   1.720255   .6202648     2.77   0.006     .4984226    2.942088
       1961  |   .5529631   .5809999     0.95   0.342    -.5915233    1.697449
       1962  |   1.401703   .5385698     2.60   0.010     .3407979    2.462608
       1968  |   2.022092   .5760126     3.51   0.001     .8874304    3.156755
       1969  |   3.192242   .5136172     6.22   0.000      2.18049    4.203994
       1970  |    3.68983   .5406953     6.82   0.000     2.624738    4.754922
       1971  |   3.338817   .5653397     5.91   0.000     2.225179    4.452455
       1972  |   3.322119   .6062628     5.48   0.000     2.127868     4.51637
             |
  yob#c.en71 |
       1957  |  -3.188886   3.273782    -0.97   0.331    -9.637765    3.259993
       1958  |   3.326529   2.634466     1.26   0.208    -1.862991    8.516049
       1959  |   .0702168   1.809247     0.04   0.969     -3.49374    3.634174
       1960  |  -2.737319   1.668069    -1.64   0.102    -6.023176    .5485377
       1961  |  -1.312241   2.450678    -0.54   0.593    -6.139723    3.515242
       1962  |   .7871137   1.437824     0.55   0.585    -2.045193    3.619421
       1968  |   1.592497   1.799092     0.89   0.377    -1.951456     5.13645
       1969  |   .3691274   1.573141     0.23   0.815    -2.729733    3.467988
       1970  |   1.954537    1.64173     1.19   0.235    -1.279435    5.188509
       1971  |   .1603102   1.921413     0.08   0.934    -3.624596    3.945217
       1972  |          0  (omitted)
             |
  yob#c.ch71 |
       1957  |   5.17e-06   2.41e-06     2.15   0.033     4.30e-07    9.92e-06
       1958  |   4.08e-06   2.19e-06     1.86   0.064    -2.43e-07    8.40e-06
       1959  |  -7.02e-07   2.20e-06    -0.32   0.750    -5.04e-06    3.64e-06
       1960  |  -6.26e-07   2.18e-06    -0.29   0.774    -4.91e-06    3.66e-06
       1961  |   2.50e-06   2.22e-06     1.12   0.262    -1.88e-06    6.88e-06
       1962  |  -1.28e-06   2.22e-06    -0.58   0.564    -5.66e-06    3.09e-06
       1968  |   1.53e-06   2.35e-06     0.65   0.516    -3.11e-06    6.17e-06
       1969  |  -8.46e-07   2.15e-06    -0.39   0.695    -5.09e-06    3.40e-06
       1970  |  -1.76e-06   1.95e-06    -0.90   0.369    -5.61e-06    2.09e-06
       1971  |  -8.76e-07   2.00e-06    -0.44   0.662    -4.82e-06    3.06e-06
       1972  |          0  (omitted)
             |
 yob#c.wsppc |
       1957  |   1.761464   1.080673     1.63   0.104    -.3673056    3.890234
       1958  |   2.369885   .9174623     2.58   0.010      .562616    4.177154
       1959  |   .8455982    .492575     1.72   0.087    -.1247038      1.8159
       1960  |   .0375649   .4819009     0.08   0.938    -.9117106    .9868404
       1961  |   1.813205   .8031514     2.26   0.025     .2311121    3.395298
       1962  |   .4528254   .3856467     1.17   0.241    -.3068432    1.212494
       1968  |   .0566797   .4649344     0.12   0.903    -.8591742    .9725336
       1969  |   .3267577    .306535     1.07   0.288    -.2770721    .9305876
       1970  |  -.6072225   .4483307    -1.35   0.177    -1.490369    .2759245
       1971  |   .5620162   .4839056     1.16   0.247    -.3912082    1.515241
       1972  |          0  (omitted)
             |
       _cons |   5.588359   .5779029     9.67   0.000     4.449974    6.726745
------------------------------------------------------------------------------

.

A lot of studies analyzed the effect of the program on education, e.g.,
Duflo, E. (2001). "Schooling and labor market consequences of school construction in Indonesia: Evidence from an unusual policy experiment." American economic review 91(4): 795-813.
Mazumder, B., et al. (2019). Intergenerational Human Capital Spillovers: Indonesia's School Construction and Its Effects on the Next Generation. AEA Papers and Proceedings.

They all find significant effects of the program on education. Duflo (2001) uses different data, but Mazumder et al. (2019) get data from the same source.

Question:
I am wondering where the differences in the magnitude and the significance of the estimates come from. I checked my data and code several times, but I don't see the problem.
Also, as soon as I add the fixed effects, the estimate on the treatment looses the significance. This would indicate that there is not enough variation between the cohorts and regions, right?
Do you have any ideas?

Any advice is appreciated!

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 9945
#2

02 Feb 2022, 07:09

Originally posted by Jennifer Klaus View Post

but Mazumder et al. (2019) get data from the same source.

Question:
I am wondering where the differences in the magnitude and the significance of the estimates come from. I checked my data and code several times, but I don't see the problem.

Are you using the exact same sample as they do? The first thing that you should do is to try to replicate their results. Differences may arise from different samples, data revisions if considering the same sample period, or errors and omissions in data preparation and model specification. The AER has a policy of having authors deposit their data and codes on their website after publication, so you should look at their analysis. On the other hand, if you are using a different sample, nothing says that the results have to be the same. If your implementation is correct, the difference is in itself a result.
Comment
Jennifer Klaus

Join Date: Sep 2021

Posts: 7
#3

02 Feb 2022, 07:32

I was able to replicate their results with the exact data set they provided. However, I established my own data set as I needed additional information from the survey.
I cross-checked the number of observations and every variable I am using with their data. Of course there are differences, but all in all it's very similar. That's why I don't understand the difference in significance (1% vs. no significance).
I am worried about the result because I wanted to use the exposure to the school construction program as an instrument for further analysis.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9945
#4

02 Feb 2022, 08:19

I am not familiar with this literature to comment further. You probably need to consult with someone who is familiar with the dataset and literature on implications of the non-significance.
Comment
Jennifer Klaus

Join Date: Sep 2021

Posts: 7
#5

02 Feb 2022, 16:15

Thank you for your opinion, Andrew.
Do you think I still might use the interaction as an instrument for education in a 2SLS model, even when the first stage was not significant?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 9945
#6

02 Feb 2022, 17:51

Why not go ahead and run the IV regression using xtivreg or ivreghdfe from SSC and look at whether the instrument passes the weak instrumental variable test?
Comment

Announcement

Problem with replication of TWFE Model

Comment

Comment

Comment

Comment

Comment