Dear All,
I am using the high-dimensional FE commands by Sergio Correia to estimate the association between variables LgAneedc and Lgm. Here is the dataex:
Why do I get a different sample size with the linear (reghdfe N=6,593) vs. Poisson pseudo-ML (ppmlhdfe N=5,074) regression?
Initially I thought it had something to do with dropping the singletons iteratively. But I believe that is done by both reghdfe and ppmlhdge... So, now I am even less sure why the difference in N between the two?
Thank you in advance for any help you may be able to offer.
I am using the high-dimensional FE commands by Sergio Correia to estimate the association between variables LgAneedc and Lgm. Here is the dataex:
* Example generated by -dataex-. For more info, type help dataex clear input float year long id float(numbersibs LgAneedc Lgm) 2001 5003 1 4.931884 11.599216 2003 5003 1 0 11.334538 2005 5003 1 0 10.918878 2007 5003 1 0 10.61348 2009 5003 1 0 10.765062 2011 5003 1 0 11.425528 2013 5003 1 0 11.290032 2015 5003 1 0 11.40166 2017 5003 1 0 11.40609 2001 6004 2 0 11.729268 2003 6004 2 0 11.085595 2005 6004 2 0 10.971986 2007 6004 2 4.775486 10.862973 2009 6004 2 0 11.12455 2011 6004 2 0 11.204343 2013 6004 2 0 11.139434 2015 6004 2 0 10.820865 2017 6004 2 0 10.98978 2001 6006 2 5.843838 12.3512 2003 6006 2 0 11.960934 2005 6006 2 0 12.428912 2007 6006 2 0 12.19357 2009 6006 2 0 11.91282 2011 6006 2 0 11.81625 2013 6006 2 0 12.28847 2015 6006 2 0 12.474733 2017 6006 2 0 12.41988 2001 6030 1 0 11.545062 2003 6030 1 4.888463 11.73737 2005 6030 1 6.625171 11.590188 2007 6030 1 0 11.644321 2009 6030 1 4.425613 11.913424 2011 6030 1 6.299826 11.986324 2013 6030 1 5.334969 12.031132 2015 6030 1 6.21779 12.37371 2017 6030 1 6.204095 12.397298 2001 7004 1 0 10.49827 2003 7004 1 0 9.740772 2005 7004 1 0 9.69968 2007 7004 1 0 9.634616 2009 7004 1 0 10.405068 2011 7004 1 0 10.392364 2013 7004 1 0 10.340804 2015 7004 1 0 10.310172 2017 7004 1 0 9.38021 2001 7033 1 0 10.667672 2003 7033 1 0 9.923084 2005 7033 1 0 9.459681 2007 7033 1 0 10.595987 2009 7033 1 0 10.963642 2011 7033 1 0 10.644748 2013 7033 1 0 9.751442 2015 7033 1 0 10.06172 2017 7033 1 0 9.728492 2001 7035 1 0 10.208698 2003 7035 1 0 10.33924 2005 7035 1 0 10.411844 2007 7035 1 0 10.72443 2009 7035 1 0 10.167242 2011 7035 1 4.1929417 10.323373 2013 7035 1 0 10.738003 2015 7035 1 0 10.31622 2017 7035 1 0 8.544072 2001 10003 1 0 9.926491 2003 10003 1 0 9.234111 2005 10003 1 0 9.631241 2007 10003 1 0 10.21517 2009 10003 1 0 9.616987 2011 10003 1 0 9.828395 2013 10003 1 0 9.608812 2015 10003 1 0 9.614655 2017 10003 1 0 9.839711 2001 10006 2 0 10.01666 2003 10006 2 0 9.74325 2005 10006 2 0 9.017304 2007 10006 2 0 10.49909 2009 10006 2 0 10.598434 2011 10006 2 0 10.445745 2013 10006 2 0 10.51307 2015 10006 2 0 10.820985 2017 10006 2 0 10.558806 2001 10007 2 0 10.485355 2003 10007 2 0 10.47935 2005 10007 2 0 10.353577 2007 10007 2 0 10.339203 2009 10007 2 0 10.181932 2011 10007 2 0 10.22087 2013 10007 2 0 10.035196 2015 10007 2 0 8.344264 2017 10007 2 0 9.710689 2001 11002 1 0 11.431934 2003 11002 1 0 10.52496 2005 11002 1 4.1547513 10.495072 2007 11002 1 0 10.365026 2009 11002 1 0 10.527725 2011 11002 1 0 10.95044 2013 11002 1 6.248363 10.213922 2015 11002 1 3.932989 9.716777 2017 11002 1 3.246045 9.670991 2001 11003 1 0 10.694862 end
. reghdfe LgAneedc Lgm if (numbersibs>1), /// > absorb(id year, save) cluster(id) (MWFE estimator converged in 3 iterations) HDFE Linear regression Number of obs = 6,593 Absorbing 2 HDFE groups F( 1, 732) = 9.06 Statistics robust to heteroskedasticity Prob > F = 0.0027 R-squared = 0.4417 Adj R-squared = 0.3709 Within R-sq. = 0.0008 Number of clusters (id) = 733 Root MSE = 2.2356 (Std. err. adjusted for 733 clusters in id) ------------------------------------------------------------------------------ | Robust LgAneedc | Coefficient std. err. t P>|t| [95% conf. interval] -------------+---------------------------------------------------------------- Lgm | .1044025 .0346949 3.01 0.003 .0362891 .1725158 _cons | .8151537 .3910215 2.08 0.037 .0474964 1.582811 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| id | 733 733 0 *| year | 9 0 9 | -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation . estimates store estfe1id . . ppmlhdfe LgAneedc Lgm if (numbersibs>1), /// > absorb(id year, save) cluster(id) (dropped 1519 observations that are either singletons or separated by a fixed effect) Iteration 1: deviance = 1.5840e+04 eps = . iters = 3 tol = 1.0e-04 min(eta) = -2.12 P Iteration 2: deviance = 1.5411e+04 eps = 2.79e-02 iters = 2 tol = 1.0e-04 min(eta) = -2.78 Iteration 3: deviance = 1.5401e+04 eps = 6.81e-04 iters = 2 tol = 1.0e-04 min(eta) = -3.42 Iteration 4: deviance = 1.5400e+04 eps = 1.16e-05 iters = 2 tol = 1.0e-04 min(eta) = -3.73 Iteration 5: deviance = 1.5400e+04 eps = 1.94e-07 iters = 2 tol = 1.0e-05 min(eta) = -3.79 Iteration 6: deviance = 1.5400e+04 eps = 1.76e-10 iters = 2 tol = 1.0e-06 min(eta) = -3.79 S Iteration 7: deviance = 1.5400e+04 eps = 1.75e-16 iters = 2 tol = 1.0e-08 min(eta) = -3.79 S O ------------------------------------------------------------------------------------------------------------ (legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance) Converged in 7 iterations and 15 HDFE sub-iterations (tol = 1.0e-08) HDFE PPML regression No. of obs = 5,074 Absorbing 2 HDFE groups Residual df = 563 Statistics robust to heteroskedasticity Wald chi2(1) = 8.16 Deviance = 15400.4716 Prob > chi2 = 0.0043 Log pseudolikelihood = -11838.62215 Pseudo R2 = 0.2007 Number of clusters (id) = 564 (Std. err. adjusted for 564 clusters in id) ------------------------------------------------------------------------------ | Robust LgAneedc | Coefficient std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- Lgm | .115435 .0403987 2.86 0.004 .036255 .1946149 _cons | -.1761017 .473641 -0.37 0.710 -1.104421 .7522175 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| id | 564 564 0 *| year | 9 0 9 | -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation . estimates store estPOISfe1id
Thank you in advance for any help you may be able to offer.