Hello,
According to the manual https://www.stata.com/manuals/rheckm...estimation.pdf,
. Therefore, I thought the results from margins, dydx(*) predict(ycond) should be the same as the regression outputs from heckman twostep when I use the same sets of variables for the selection equation and the outcome equation, because they both estimate the intensive margin.
******result A
******result B
I thought result A should be the same as result B, but they are different. I was wondering why they would be different if they are essentially estimating the same thing: the intensive margin of wage. Therefore, I would like to ask whether anyone knows where I can find the exact formulas as to how Stata computes margins, predict(ycond) for heckman postestimation. I've read the manuals for heckman, heckman postestimation, and margins and I cannot find the formulas. Thank you.
According to the manual https://www.stata.com/manuals/rheckm...estimation.pdf,
ycond calculates the expected value of the dependent variable conditional on the dependent variable being observed, that is, selected; E(yj | yj observed)
Code:
. use https://www.stata-press.com/data/r17/womenwk,clear . heckman wage educ age married children, select(married children educ age) twostep Heckman selection model -- two-step estimates Number of obs = 2,000 (regression model with sample selection) Selected = 1,343 Nonselected = 657 Wald chi2(4) = 465.93 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ wage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- wage | education | .9564292 .0682882 14.01 0.000 .8225867 1.090272 age | .1976756 .0320365 6.17 0.000 .1348852 .260466 married | -.0591138 .4518117 -0.13 0.896 -.9446484 .8264209 children | -.1685601 .3002736 -0.56 0.575 -.7570855 .4199653 _cons | 2.373219 3.156035 0.75 0.452 -3.812495 8.558934 -------------+---------------------------------------------------------------- select | married | .4308575 .074208 5.81 0.000 .2854125 .5763025 children | .4473249 .0287417 15.56 0.000 .3909922 .5036576 education | .0583645 .0109742 5.32 0.000 .0368555 .0798735 age | .0347211 .0042293 8.21 0.000 .0264318 .0430105 _cons | -2.467365 .1925635 -12.81 0.000 -2.844782 -2.089948 -------------+---------------------------------------------------------------- /mills | lambda | 3.129573 1.67778 1.87 0.062 -.1588145 6.417961 -------------+---------------------------------------------------------------- rho | 0.54691 sigma | 5.722303 ------------------------------------------------------------------------------
Code:
. . heckman wage educ age married children, select(married children educ age) Iteration 0: log likelihood = -5180.3264 Iteration 1: log likelihood = -5178.437 Iteration 2: log likelihood = -5178.2598 Iteration 3: log likelihood = -5178.2594 Heckman selection model Number of obs = 2,000 (regression model with sample selection) Selected = 1,343 Nonselected = 657 Wald chi2(4) = 508.96 Log likelihood = -5178.259 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ wage | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- wage | education | .9911275 .0556766 17.80 0.000 .8820033 1.100252 age | .2131395 .0217364 9.81 0.000 .1705369 .2557421 married | .0857615 .3851295 0.22 0.824 -.6690785 .8406015 children | .0342465 .1406193 0.24 0.808 -.2413622 .3098552 _cons | .3066527 1.322947 0.23 0.817 -2.286276 2.899582 -------------+---------------------------------------------------------------- select | married | .4506089 .0726849 6.20 0.000 .308149 .5930687 children | .4394905 .0279758 15.71 0.000 .384659 .4943221 education | .0554688 .0107712 5.15 0.000 .0343576 .0765801 age | .0364324 .0041711 8.73 0.000 .0282571 .0446077 _cons | -2.490319 .1894902 -13.14 0.000 -2.861713 -2.118925 -------------+---------------------------------------------------------------- /athrho | .8942646 .1255919 7.12 0.000 .648109 1.14042 /lnsigma | 1.797193 .0326504 55.04 0.000 1.7332 1.861187 -------------+---------------------------------------------------------------- rho | .7134937 .0616564 .5703956 .8145555 sigma | 6.032692 .1969699 5.658731 6.431366 lambda | 4.304288 .491931 3.340121 5.268455 ------------------------------------------------------------------------------ LR test of indep. eqns. (rho = 0): chi2(1) = 17.13 Prob > chi2 = 0.0000 . margins, dydx(*) predict(ycond) Average marginal effects Number of obs = 2,000 Model VCE : OIM Expression : E(wage|Zg>0), predict(ycond) dy/dx w.r.t. : education age married children ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- education | .8765307 .0495022 17.71 0.000 .7795082 .9735531 age | .1378713 .0191114 7.21 0.000 .1004136 .175329 married | -.8451819 .3509667 -2.41 0.016 -1.533064 -.1572997 children | -.8737267 .1063705 -8.21 0.000 -1.082209 -.6652444 ------------------------------------------------------------------------------ .
I thought result A should be the same as result B, but they are different. I was wondering why they would be different if they are essentially estimating the same thing: the intensive margin of wage. Therefore, I would like to ask whether anyone knows where I can find the exact formulas as to how Stata computes margins, predict(ycond) for heckman postestimation. I've read the manuals for heckman, heckman postestimation, and margins and I cannot find the formulas. Thank you.
Comment