Dear Statalist, I am using a panel data of firms to study how some sectoral characteristics affect the performance of the firm, and how a firm characteristic may moderate this effect. I am using a FE model “reghdfe” in which x1 is the firm-level moderator and z1 and z2 are the main sectoral variables. The rest (x2, x3 and z3) are firm and sectoral controls, with FEs for year and firm, and standard errors clustered at sectoral level (30 groups).
However, the moderator (x1) is likely to be endogenous. And I would like to check if lagged values of the moderator may be used as instruments.
I am wondering if you could point me towards the good direction with this issue, maybe professor Jeff Wooldridge may give me an advice. After reading Lin and Wooldridge, 2019 “Testing and Correcting for Endogeneity in Nonlinear Unobserved Effects Models”, I realized about the control function approach for correcting endogeneity. Since in my panel (firm-year) setup the endogenous variable is a firm-level variable (L.c.x1) and I plan to use its lagged values (L2.c.x1 and L3.c.x1) as instruments, I wonder about the possibility of addressing this through the control function approach.
If I am not wrong, the two rules for IV to be effective are:
This is, I believe, a similar approach from the one in the Lin and Wooldridge’s paper, even though I believe they recommend to estimate the first stage as a FE regression (not gmm) for the linear case. I am wondering if you could help me understand if this procedure can be applied. If not, which could be an option then? See below the commands.
Thanks!
Code:
reghdfe y L.c.x2 L.c.x3 L.c.x1##(L.c.z1 L.c.z2) L.c.z3 if sample==1, absorb(year id) cluster(sec)
I am wondering if you could point me towards the good direction with this issue, maybe professor Jeff Wooldridge may give me an advice. After reading Lin and Wooldridge, 2019 “Testing and Correcting for Endogeneity in Nonlinear Unobserved Effects Models”, I realized about the control function approach for correcting endogeneity. Since in my panel (firm-year) setup the endogenous variable is a firm-level variable (L.c.x1) and I plan to use its lagged values (L2.c.x1 and L3.c.x1) as instruments, I wonder about the possibility of addressing this through the control function approach.
If I am not wrong, the two rules for IV to be effective are:
- The instruments must be correlated with the endogenous variable.
- The instruments must NOT be correlated with the main dependent variable
This is, I believe, a similar approach from the one in the Lin and Wooldridge’s paper, even though I believe they recommend to estimate the first stage as a FE regression (not gmm) for the linear case. I am wondering if you could help me understand if this procedure can be applied. If not, which could be an option then? See below the commands.
Thanks!
Code:
* First Stage . xtset id year panel variable: id (unbalanced) time variable: year, 2004 to 2016, but with gaps delta: 1 unit . xtabond2 x1 L.c.x1 L2.c.x1 L.x2 L.x3 L.z1 L.z2 L.z3 if sample==1, gmm(L.c.x1 L2.c.x1) iv(L.x2 L.x3 L.z1 L.z2 L.z3) twostep robust Favoring space over speed. To switch, type or click on mata: mata set matafavor speed, perm. Warning: Two-step estimated covariance matrix of moments is singular. Using a generalized inverse to calculate optimal weighting matrix for two-step estimation. Difference-in-Sargan/Hansen statistics may be negative. DFm 7 Dynamic panel-data estimation, two-step system GMM ------------------------------------------------------------------------------ Group variable: id Number of obs = 48148 Time variable : year Number of groups = 7756 Number of instruments = 94 Obs per group: min = 1 Wald chi2(7) = 111.58 avg = 6.21 Prob > chi2 = 0.000 max = 10 ------------------------------------------------------------------------------ | Corrected x1 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | L1. | .1343917 .0154153 8.72 0.000 .1041782 .1646052 L2. | .0059877 .0074001 0.81 0.418 -.0085161 .0204916 | x2 | L1. | -.0105319 .0137286 -0.77 0.443 -.0374394 .0163756 | x3 | L1. | -.0001653 .000155 -1.07 0.286 -.0004692 .0001386 | z1 | L1. | -.0005157 .0060638 -0.09 0.932 -.0124005 .0113691 | z2 | L1. | -.0147577 .0058943 -2.50 0.012 -.0263103 -.003205 | z3 | L1. | .0008048 .0053182 0.15 0.880 -.0096186 .0112283 | _cons | .0130476 .0103695 1.26 0.208 -.0072764 .0333715 ------------------------------------------------------------------------------ Instruments for first differences equation Standard D.(L.x2 L.x3 L.z1 L.z2 L.z3) GMM-type (missing=0, separate instruments for each period unless collapsed) L(1/12).(L.x1 L2.x1) Instruments for levels equation Standard L.x2 L.x3 L.z1 L.z2 L.z3 _cons GMM-type (missing=0, separate instruments for each period unless collapsed) D.(L.x1 L2.x1) ------------------------------------------------------------------------------ Arellano-Bond test for AR(1) in first differences: z = -13.47 Pr > z = 0.000 Arellano-Bond test for AR(2) in first differences: z = 0.06 Pr > z = 0.949 ------------------------------------------------------------------------------ Sargan test of overid. restrictions: chi2(86) = 687.97 Prob > chi2 = 0.000 (Not robust, but not weakened by many instruments.) Hansen test of overid. restrictions: chi2(86) = 113.76 Prob > chi2 = 0.024 (Robust, but weakened by many instruments.) Difference-in-Hansen tests of exogeneity of instrument subsets: GMM instruments for levels Hansen test excluding group: chi2(67) = 75.34 Prob > chi2 = 0.227 Difference (null H = exogenous): chi2(19) = 38.42 Prob > chi2 = 0.005 iv(L.x2 L.x3 L.z1 L.z2 L.z3) Hansen test excluding group: chi2(81) = 101.15 Prob > chi2 = 0.064 Difference (null H = exogenous): chi2(5) = 12.61 Prob > chi2 = 0.027 . predict e_x1, residuals (87,582 missing values generated) . . foreach x in x1 e_x1 x2 x3 z1 z2 z3 { 2. gen L`x' = L.`x' 3. gen L2`x' = L2.`x' 4. } (48,902 missing values generated) (55,345 missing values generated) (94,488 missing values generated) (98,782 missing values generated) (70,108 missing values generated) (74,486 missing values generated) (39,652 missing values generated) (46,095 missing values generated) (15,349 missing values generated) (26,927 missing values generated) (24,466 missing values generated) (36,044 missing values generated) (12,301 missing values generated) (24,146 missing values generated) . . * Second Stage (using boothstrap becuase of the generated residuals before) . xtset id year panel variable: id (unbalanced) time variable: year, 2004 to 2016, but with gaps delta: 1 unit . bootstrap, reps(200) seed(12345) cluster(sec) idcluster(new_id) group(id): reghdfe y c.Le_x1 c.Lx1##(c.Lz1 c.Lz2) Lz3 Lx2 Lx3 if sample==1, absorb(year id) (running reghdfe on estimation sample) Bootstrap replications (200) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 HDFE Linear regression Number of obs = 38,859 Absorbing 2 HDFE groups Wald chi2(9) = 229.46 Prob > chi2 = 0.0000 R-squared = 0.5848 Adj R-squared = 0.5011 Within R-sq. = 0.0098 Root MSE = 4.7168 (Replications based on 30 clusters in sec) ------------------------------------------------------------------------------ | Observed Bootstrap Normal-based y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- Le_x1 | .132053 .1288464 1.02 0.305 -.1204812 .3845873 Lx1 | -.1303763 .1272809 -1.02 0.306 -.3798423 .1190897 Lz1 | .5175391 .3034185 1.71 0.088 -.0771502 1.112228 Lz2 | .4737847 .285459 1.66 0.097 -.0857047 1.033274 | c.Lx1#c.Lz1 | .1022605 .0498364 2.05 0.040 .0045829 .1999382 | c.Lx1#c.Lz2 | -.0885264 .0773222 -1.14 0.252 -.2400751 .0630223 | Lz3 | -.1046715 .4019864 -0.26 0.795 -.8925503 .6832074 Lx2 | .898409 .1043804 8.61 0.000 .6938271 1.102991 Lx3 | .0109577 .0010842 10.11 0.000 .0088326 .0130828 _cons | 3.717599 .2234197 16.64 0.000 3.279705 4.155494 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| year | 9 0 9 | id | 6501 1 6500 | -----------------------------------------------------+
Code:
. * Looking if lags are correlated with y (without the correction)
. xtset id year
panel variable: id (unbalanced)
time variable: year, 2004 to 2016, but with gaps
delta: 1 unit
. reghdfe y c.L2.x1 c.L3.x1 c.L.x1##(c.L.z1 c.L.z2) L.z3 L.x2 L.x3 if sample==1, absorb(year id)
(dropped 578 singleton observations)
(MWFE estimator converged in 6 iterations)
HDFE Linear regression Number of obs = 40,398
Absorbing 2 HDFE groups F( 10, 33749) = 32.94
Prob > F = 0.0000
R-squared = 0.5839
Adj R-squared = 0.5020
Within R-sq. = 0.0097
Root MSE = 4.6859
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x1 |
L2. | -.011996 .0209289 -0.57 0.567 -.0530173 .0290253
L3. | -.0080883 .023283 -0.35 0.728 -.0537237 .0375472
L1. | .0024909 .0220422 0.11 0.910 -.0407126 .0456943
|
z1 |
L1. | .4306778 .1623386 2.65 0.008 .1124886 .7488671
|
z2 |
L1. | .480526 .1679825 2.86 0.004 .1512746 .8097774
|
cL.x1#cL.z1 | .0998922 .0451363 2.21 0.027 .0114235 .188361
|
cL.x1#cL.z2 | -.0888344 .0474547 -1.87 0.061 -.1818473 .0041785
|
z3 |
L1. | .0824461 .2935659 0.28 0.779 -.4929531 .6578452
|
x2 |
L1. | .8748987 .0775557 11.28 0.000 .7228869 1.02691
|
x3 |
L1. | .0110253 .0008895 12.39 0.000 .0092818 .0127688
|
_cons | 3.619166 .0664242 54.49 0.000 3.488972 3.74936
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
year | 9 0 9 |
id | 6631 1 6630 |
-----------------------------------------------------+
Comment