Dear statalist,
I have a probit model with several interaction terms, the model is
where DV is a dummy, so I use probit model; var1 is a continuous variable; H_d1 H_d2 H_d3 are 3 dummies, H_d1=1 if the value is above industry median of d1 in a year, similar definitions apply to H_d2 and H_d3.
I use the following code to generate H_d1 (same method for H_d2 and H_d3)
Later I want to use L_d1 L_d2 L_d3 instead (below industry median in a year would code 1), so I use the following code to generate L_d1 (same method for L_d2 and L_d3)
So basically L_d1=1 is when H_d1=0. I expect the regression result for the IVs to have exactly the opposite sign and the same coefficient, that is indeed the case for the dummies alone and for the interaction terms, but not so for var1. Specifically, the results look like this
use H_d1 H_d2 H_d3
use L_d1 L_d2 L_d3
I wonder why var1 has very different coefficient and significance level in the two regressions? I'm really confused... Please kindly give me some suggestions, thanks!
I have a probit model with several interaction terms, the model is
Code:
probit DV c.var1##i.H_d1 c.var1##i.H_d2 c.var1##i.H_d3 controls i.industry i.year ,vce(cluster symbol)
I use the following code to generate H_d1 (same method for H_d2 and H_d3)
Code:
egen quantiles_d1 = xtile(d1), by(year industry) nq(2) gen H_d1 = 1 if quantiles_d1 == 1 replace H_d1 = 0 if quantiles_d1 == 2
Code:
egen quantiles_d1 = xtile(d1), by(year industry) nq(2) gen L_d1 = 1 if quantiles_d1 == 2 replace L_d1 = 0 if quantiles_d1 == 1
use H_d1 H_d2 H_d3
Code:
----------------------------------------------------------------------------------------------------- | Robust DV | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------------------------+---------------------------------------------------------------- var1 | 4.429392 .8964463 4.94 0.000 2.67239 6.186395 | 1.H_d1 | .343354 .1052533 3.26 0.001 .1370614 .5496466 | H_d1#cvar1 | 1 | -1.466482 .6460026 -2.27 0.023 -2.732623 -.2003396 | 1.H_d2 | .2115119 .0902183 2.34 0.019 .0346873 .3883365 | H_d2#cvar1 | 1 | -1.016277 .6039841 -1.68 0.092 -2.200065 .1675097 | 1.H_d3 | .2059276 .095487 2.16 0.031 .0187765 .3930786 | H_d3#cvar1 | 1 | -1.638603 .590966 -2.77 0.006 -2.796875 -.4803306
Code:
----------------------------------------------------------------------------------------------------- | Robust DV | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------------------------+---------------------------------------------------------------- var1 | .3080308 .8580195 0.36 0.720 -1.373656 1.989718 | 1.L_d1 | -.343354 .1052533 -3.26 0.001 -.5496466 -.1370614 | L_d1#cvar1 | 1 | 1.466482 .6460026 2.27 0.023 .2003396 2.732623 | 1.L_d2 | -.2115119 .0902183 -2.34 0.019 -.3883365 -.0346873 | L_d2#cvar1 | 1 | 1.016277 .6039841 1.68 0.092 -.1675097 2.200065 | 1.L_d3 | -.2059276 .095487 -2.16 0.031 -.3930786 -.0187765 | L_d3#cvar1 | 1 | 1.638603 .590966 2.77 0.006 .4803306 2.796875
I wonder why var1 has very different coefficient and significance level in the two regressions? I'm really confused... Please kindly give me some suggestions, thanks!
Comment