Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in sample size reghdfe vs ppmlhdfe

    Dear All,

    I am using the high-dimensional FE commands by Sergio Correia to estimate the association between variables LgAneedc and Lgm. Here is the dataex:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float year long id float(numbersibs LgAneedc Lgm)
    2001  5003 1  4.931884 11.599216
    2003  5003 1         0 11.334538
    2005  5003 1         0 10.918878
    2007  5003 1         0  10.61348
    2009  5003 1         0 10.765062
    2011  5003 1         0 11.425528
    2013  5003 1         0 11.290032
    2015  5003 1         0  11.40166
    2017  5003 1         0  11.40609
    2001  6004 2         0 11.729268
    2003  6004 2         0 11.085595
    2005  6004 2         0 10.971986
    2007  6004 2  4.775486 10.862973
    2009  6004 2         0  11.12455
    2011  6004 2         0 11.204343
    2013  6004 2         0 11.139434
    2015  6004 2         0 10.820865
    2017  6004 2         0  10.98978
    2001  6006 2  5.843838   12.3512
    2003  6006 2         0 11.960934
    2005  6006 2         0 12.428912
    2007  6006 2         0  12.19357
    2009  6006 2         0  11.91282
    2011  6006 2         0  11.81625
    2013  6006 2         0  12.28847
    2015  6006 2         0 12.474733
    2017  6006 2         0  12.41988
    2001  6030 1         0 11.545062
    2003  6030 1  4.888463  11.73737
    2005  6030 1  6.625171 11.590188
    2007  6030 1         0 11.644321
    2009  6030 1  4.425613 11.913424
    2011  6030 1  6.299826 11.986324
    2013  6030 1  5.334969 12.031132
    2015  6030 1   6.21779  12.37371
    2017  6030 1  6.204095 12.397298
    2001  7004 1         0  10.49827
    2003  7004 1         0  9.740772
    2005  7004 1         0   9.69968
    2007  7004 1         0  9.634616
    2009  7004 1         0 10.405068
    2011  7004 1         0 10.392364
    2013  7004 1         0 10.340804
    2015  7004 1         0 10.310172
    2017  7004 1         0   9.38021
    2001  7033 1         0 10.667672
    2003  7033 1         0  9.923084
    2005  7033 1         0  9.459681
    2007  7033 1         0 10.595987
    2009  7033 1         0 10.963642
    2011  7033 1         0 10.644748
    2013  7033 1         0  9.751442
    2015  7033 1         0  10.06172
    2017  7033 1         0  9.728492
    2001  7035 1         0 10.208698
    2003  7035 1         0  10.33924
    2005  7035 1         0 10.411844
    2007  7035 1         0  10.72443
    2009  7035 1         0 10.167242
    2011  7035 1 4.1929417 10.323373
    2013  7035 1         0 10.738003
    2015  7035 1         0  10.31622
    2017  7035 1         0  8.544072
    2001 10003 1         0  9.926491
    2003 10003 1         0  9.234111
    2005 10003 1         0  9.631241
    2007 10003 1         0  10.21517
    2009 10003 1         0  9.616987
    2011 10003 1         0  9.828395
    2013 10003 1         0  9.608812
    2015 10003 1         0  9.614655
    2017 10003 1         0  9.839711
    2001 10006 2         0  10.01666
    2003 10006 2         0   9.74325
    2005 10006 2         0  9.017304
    2007 10006 2         0  10.49909
    2009 10006 2         0 10.598434
    2011 10006 2         0 10.445745
    2013 10006 2         0  10.51307
    2015 10006 2         0 10.820985
    2017 10006 2         0 10.558806
    2001 10007 2         0 10.485355
    2003 10007 2         0  10.47935
    2005 10007 2         0 10.353577
    2007 10007 2         0 10.339203
    2009 10007 2         0 10.181932
    2011 10007 2         0  10.22087
    2013 10007 2         0 10.035196
    2015 10007 2         0  8.344264
    2017 10007 2         0  9.710689
    2001 11002 1         0 11.431934
    2003 11002 1         0  10.52496
    2005 11002 1 4.1547513 10.495072
    2007 11002 1         0 10.365026
    2009 11002 1         0 10.527725
    2011 11002 1         0  10.95044
    2013 11002 1  6.248363 10.213922
    2015 11002 1  3.932989  9.716777
    2017 11002 1  3.246045  9.670991
    2001 11003 1         0 10.694862
    end
    Why do I get a different sample size with the linear (reghdfe N=6,593) vs. Poisson pseudo-ML (ppmlhdfe N=5,074) regression?

    Code:
    . reghdfe LgAneedc        Lgm     if (numbersibs>1), ///
    >         absorb(id year, save) cluster(id)
    (MWFE estimator converged in 3 iterations)
    
    HDFE Linear regression                            Number of obs   =      6,593
    Absorbing 2 HDFE groups                           F(   1,    732) =       9.06
    Statistics robust to heteroskedasticity           Prob > F        =     0.0027
                                                      R-squared       =     0.4417
                                                      Adj R-squared   =     0.3709
                                                      Within R-sq.    =     0.0008
    Number of clusters (id)      =        733         Root MSE        =     2.2356
    
                                       (Std. err. adjusted for 733 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
        LgAneedc | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             Lgm |   .1044025   .0346949     3.01   0.003     .0362891    .1725158
           _cons |   .8151537   .3910215     2.08   0.037     .0474964    1.582811
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
              id |       733         733           0    *|
            year |         9           0           9     |
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    
    .         estimates store estfe1id        
    
    .         
    . ppmlhdfe LgAneedc       Lgm     if (numbersibs>1), ///
    >         absorb(id year, save) cluster(id)
    (dropped 1519 observations that are either singletons or separated by a fixed effect)
    Iteration 1:   deviance = 1.5840e+04  eps = .         iters = 3    tol = 1.0e-04  min(eta) =  -2.12  P   
    Iteration 2:   deviance = 1.5411e+04  eps = 2.79e-02  iters = 2    tol = 1.0e-04  min(eta) =  -2.78      
    Iteration 3:   deviance = 1.5401e+04  eps = 6.81e-04  iters = 2    tol = 1.0e-04  min(eta) =  -3.42      
    Iteration 4:   deviance = 1.5400e+04  eps = 1.16e-05  iters = 2    tol = 1.0e-04  min(eta) =  -3.73      
    Iteration 5:   deviance = 1.5400e+04  eps = 1.94e-07  iters = 2    tol = 1.0e-05  min(eta) =  -3.79      
    Iteration 6:   deviance = 1.5400e+04  eps = 1.76e-10  iters = 2    tol = 1.0e-06  min(eta) =  -3.79   S  
    Iteration 7:   deviance = 1.5400e+04  eps = 1.75e-16  iters = 2    tol = 1.0e-08  min(eta) =  -3.79   S O
    ------------------------------------------------------------------------------------------------------------
    (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance)
    Converged in 7 iterations and 15 HDFE sub-iterations (tol = 1.0e-08)
    
    HDFE PPML regression                              No. of obs      =      5,074
    Absorbing 2 HDFE groups                           Residual df     =        563
    Statistics robust to heteroskedasticity           Wald chi2(1)    =       8.16
    Deviance             =   15400.4716               Prob > chi2     =     0.0043
    Log pseudolikelihood = -11838.62215               Pseudo R2       =     0.2007
    
    Number of clusters (id)     =        564
                                       (Std. err. adjusted for 564 clusters in id)
    ------------------------------------------------------------------------------
                 |               Robust
        LgAneedc | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             Lgm |    .115435   .0403987     2.86   0.004      .036255    .1946149
           _cons |  -.1761017    .473641    -0.37   0.710    -1.104421    .7522175
    ------------------------------------------------------------------------------
    
    Absorbed degrees of freedom:
    -----------------------------------------------------+
     Absorbed FE | Categories  - Redundant  = Num. Coefs |
    -------------+---------------------------------------|
              id |       564         564           0    *|
            year |         9           0           9     |
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    
    .         estimates store estPOISfe1id
    Initially I thought it had something to do with dropping the singletons iteratively. But I believe that is done by both reghdfe and ppmlhdge... So, now I am even less sure why the difference in N between the two?
    Thank you in advance for any help you may be able to offer.
    Sincerely,
    Sumedha
Working...
X