e(sample) function

Christopher Weber

Join Date: Nov 2021
Posts: 42

e(sample) function

06 Dec 2021, 11:44

Hello everyone,
I used the e(sample) function to check, which observations of my panel data set can be used for regression. Can anyone explain to me, why the function generate sample=e(sample) returns a "0" on the observation "PERMNO" = 90215; "YearEffective" = 2003 ? From my point of view there is no reason for not taking it into the regression.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(PERMNO YearEffective) int patents float(xrdintensity ln_emp) int AcquirorPrimarySICCode str4 curcd byte Numberofmergers float sample
90215 2002 0  .11715776         . 7372 "USD" 0 0
90215 2003 0  .07955571  .4173937 7372 "USD" 0 0
90215 2004 0 .035016168  .5692832 7372 "USD" 0 1
90215 2005 0  .05366315  .8346468 7372 "USD" 0 1
90215 2006 0  .06710567 1.1216775 7372 "USD" 1 1
90215 2007 1  .05856499  1.282599 7372 "USD" 0 1
90215 2008 0  .06725809 1.5186375 7372 "USD" 0 1
90215 2009 5  .05361228 1.6032186 7372 "USD" 0 1
90215 2010 9  .06078194 1.8415016 7372 "USD" 0 1
end

Having used the "fillin" command in advance to using , generate sample=e(sample), a "1" is returned on this observation, indicating it can be used for regression.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double(PERMNO YearEffective) int patents float(xrdintensity ln_emp) int AcquirorPrimarySICCode str4 curcd byte Numberofmergers float sample
90215 2000 0          .         . 7372 ""    0 0
90215 2001 0          .         . 7372 ""    0 0
90215 2002 0  .11715776         . 7372 "USD" 0 0
90215 2003 0  .07955571  .4173937 7372 "USD" 0 1
90215 2004 0 .035016168  .5692832 7372 "USD" 0 1
90215 2005 0  .05366315  .8346468 7372 "USD" 0 1
90215 2006 0  .06710567 1.1216775 7372 "USD" 1 1
90215 2007 1  .05856499  1.282599 7372 "USD" 0 1
90215 2008 0  .06725809 1.5186375 7372 "USD" 0 1
90215 2009 5  .05361228 1.6032186 7372 "USD" 0 1
90215 2010 9  .06078194 1.8415016 7372 "USD" 0 1
end

Am I missing out on something or is this sort of a bug?

Thank you in advance for your help

Chris

Last edited by Christopher Weber; 06 Dec 2021, 11:52.

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

06 Dec 2021, 12:11

e(sample) looks back to the last model estimated -- in a sense I shall expand on.

You seem to be wanting to use it to look forwards. That will work in the way you want if and only if the observations to be used in your next model happen to be the same as those used in your last model, but even so, that's coincidence not prescience. e(sample) does not on the fly check your data for missings.

The function is a bit of an odd duck because it never returns missing even when you might think it should.

Code:

. clear . sysuse auto (1978 automobile data) . gen check = e(sample) . tab check check | Freq. Percent Cum. ------------+----------------------------------- 0 | 74 100.00 100.00 ------------+----------------------------------- Total | 74 100.00

My reading: This is as much linguistic as logical in that we are all encouraged to be idiomatic Statawise and write

Code:

... if e(sample)

-- or on occasion to use its negation --

which would be the source of puzzling if not horrible bugs if e(sample) were ever to return missing. (To recall, non-zero arguments -- thus numeric missing too -- are regarded as logically true.).

Thus e(sample) being 0 in the example above is agnostic as well as factual, and means "there is no record in memory of these observations being used in a model fit".

In a nutshell e(sample) returns 1 if and only if an observation was used in the last model fit, which is almost always the result that may be useful.
1 like
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#3

06 Dec 2021, 12:11

If you ran the following code immediately after the estimation routine, then you can be sure that the observation was (not) included in the sample.

Code:

generate sample=e(sample)

What we cannot tell you is why it was (not) included because we cannot see the precise command used to fit your model. Please show us exactly the model you tried to fit and the output of that model wouldn't hurt to show either.
1 like
Comment

Christopher Weber

Join Date: Nov 2021
Posts: 42

06 Dec 2021, 12:34

Thank you for your answeres.
The background of using e(sample) was the following:

I conducted one regression without the "fillin" command and it indicated usage of 881 observations.
After that I conducted one regression with using "fillin" in advance, then it indicated usage of 882 observations.
To check which value was not taken into account in the first case, I used e(sample) after each of the regressions above, to find the one different observation. It turned out to be the observation mentioned above. Even if the e(sample) function is a "looking back" command, there should not be any differences between the two approaches, if I understood correctly.

I am trying to conduct a poisson regression:

Code:

xtpoisson patents c.xrdintensity##cl.Numberofmergers c.xrdintensity##cl2.Numberofmergers ln_emp i.YearEffective i. AcquirorPrimarySICCode

the return tables are the following:

Without "fillin" command:

Code:

Fitting Poisson model:

Iteration 0:   log likelihood = -1723819.7  (not concave)
Iteration 1:   log likelihood = -1448008.7  
Iteration 2:   log likelihood = -1308862.9  (backed up)
Iteration 3:   log likelihood = -813581.19  (backed up)
Iteration 4:   log likelihood = -797918.29  (backed up)
Iteration 5:   log likelihood = -754367.22  (backed up)
Iteration 6:   log likelihood = -737182.98  (backed up)
Iteration 7:   log likelihood = -531296.37  
Iteration 8:   log likelihood =  -48261.64  
Iteration 9:   log likelihood = -27180.276  
Iteration 10:  log likelihood = -19897.037  
Iteration 11:  log likelihood = -19708.139  
Iteration 12:  log likelihood = -19706.767  
Iteration 13:  log likelihood = -19706.767  

Fitting full model:

Iteration 0:   log likelihood =  -6963.042  
Iteration 1:   log likelihood = -6389.6288  
Iteration 2:   log likelihood = -6235.5232  
Iteration 3:   log likelihood = -6159.9047  
Iteration 4:   log likelihood = -6158.7126  
Iteration 5:   log likelihood = -6158.7104  
Iteration 6:   log likelihood = -6158.7104  

Random-effects Poisson regression                   Number of obs    =     881
Group variable: PERMNO                              Number of groups =     124

Random effects u_i ~ Gamma                          Obs per group:
                                                                 min =       1
                                                                 avg =     7.1
                                                                 max =       9

                                                    Wald chi2(38)    = 8532.18
Log likelihood = -6158.7104                         Prob > chi2      =  0.0000

----------------------------------------------------------------------------------------------------
                           patents | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-----------------------------------+----------------------------------------------------------------
                      xrdintensity |  -1.120416   .1517942    -7.38   0.000    -1.417927   -.8229044
                                   |
                   Numberofmergers |
                               L1. |  -.1026452   .0091916   -11.17   0.000    -.1206604   -.0846299
                                   |
 c.xrdintensity#cL.Numberofmergers |   1.120803    .093779    11.95   0.000     .9369991    1.304606
                                   |
                      xrdintensity |          0  (omitted)
                                   |
                   Numberofmergers |
                               L2. |  -.0541916   .0071142    -7.62   0.000    -.0681351    -.040248
                                   |
c.xrdintensity#cL2.Numberofmergers |   .5035829   .0660595     7.62   0.000     .3741086    .6330572
                                   |
                            ln_emp |   1.277215   .0202322    63.13   0.000     1.237561     1.31687
                                   |
                     YearEffective |
                             2003  |    .070271   .0138705     5.07   0.000     .0430854    .0974567
                             2004  |   .0638179   .0138225     4.62   0.000     .0367263    .0909096
                             2005  |   .0238406    .014322     1.66   0.096    -.0042299    .0519111
                             2006  |   .2299319   .0147869    15.55   0.000     .2009501    .2589137
                             2007  |   .0705921   .0153488     4.60   0.000     .0405091    .1006751
                             2008  |   .1167685   .0145884     8.00   0.000     .0881758    .1453612
                             2009  |   .2626573    .013854    18.96   0.000      .235504    .2898106
                             2010  |   .2742128   .0140069    19.58   0.000     .2467598    .3016658
                                   |
            AcquirorPrimarySICCode |
                             2836  |   -.242717   .6880069    -0.35   0.724    -1.591186    1.105752
                             3571  |   .0972441   1.070966     0.09   0.928     -2.00181    2.196298
                             3572  |   .6372489   .8229056     0.77   0.439    -.9756164    2.250114
                             3577  |    .379349   .5993693     0.63   0.527    -.7953932    1.554091
                             3661  |   .6313768   .5439139     1.16   0.246    -.4346749    1.697428
                             3663  |   .6232291   .8209094     0.76   0.448    -.9857237    2.232182
                             3669  |    .280968   1.073564     0.26   0.794    -1.823179    2.385115
                             3672  |   -5.51969    1.12849    -4.89   0.000     -7.73149   -3.307889
                             3674  |   1.851899   .4993496     3.71   0.000     .8731915    2.830606
                             3812  |  -.2202493   1.077741    -0.20   0.838    -2.332582    1.892084
                             3826  |  -.0475303   1.088496    -0.04   0.965    -2.180943    2.085883
                             3829  |  -1.737129    .827428    -2.10   0.036    -3.358858   -.1154001
                             3841  |   .0801203   .6631733     0.12   0.904    -1.219675    1.379916
                             3845  |    -1.2565   .8692749    -1.45   0.148    -2.960247    .4472476
                             4812  |   .6747005   1.075412     0.63   0.530    -1.433068    2.782469
                             4813  |  -.5512871   .8906515    -0.62   0.536    -2.296932    1.194358
                             7371  |   .0041362   1.074446     0.00   0.997    -2.101739    2.110011
                             7372  |  -.1760911   .4688733    -0.38   0.707    -1.095066    .7428837
                             7373  |  -1.586968   .6820805    -2.33   0.020    -2.923821   -.2501151
                             7374  |   .5435725   1.074358     0.51   0.613     -1.56213    2.649275
                             7375  |  -1.113282   .8751881    -1.27   0.203    -2.828619    .6020556
                             7376  |   -.257933   1.089939    -0.24   0.813    -2.394174    1.878308
                             7379  |  -.3648681   1.074097    -0.34   0.734    -2.470059    1.740322
                             8731  |   2.474682   1.070612     2.31   0.021     .3763207    4.573044
                                   |
                             _cons |    .695324   .4478711     1.55   0.121    -.1824873    1.573135
-----------------------------------+----------------------------------------------------------------
                          /lnalpha |  -.0585005   .1282633                      -.309892    .1928909
-----------------------------------+----------------------------------------------------------------
                             alpha |   .9431777   .1209751                      .7335262    1.212751
----------------------------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 2.7e+04              Prob >= chibar2 = 0.000

With "fillin" command:

Code:

Fitting Poisson model:

Iteration 0:   log likelihood = -1726119.1  (not concave)
Iteration 1:   log likelihood = -1449940.2  
Iteration 2:   log likelihood = -1306970.8  (backed up)
Iteration 3:   log likelihood = -773189.47  (backed up)
Iteration 4:   log likelihood = -757200.73  (backed up)
Iteration 5:   log likelihood = -711824.79  (backed up)
Iteration 6:   log likelihood = -682899.62  
Iteration 7:   log likelihood = -497600.58  
Iteration 8:   log likelihood = -51084.657  
Iteration 9:   log likelihood = -30613.987  
Iteration 10:  log likelihood = -19877.429  
Iteration 11:  log likelihood = -19713.303  
Iteration 12:  log likelihood = -19712.481  
Iteration 13:  log likelihood = -19712.481  

Fitting full model:

Iteration 0:   log likelihood = -6964.4037  
Iteration 1:   log likelihood = -6388.0701  
Iteration 2:   log likelihood = -6233.6999  
Iteration 3:   log likelihood = -6160.5029  
Iteration 4:   log likelihood = -6159.2998  
Iteration 5:   log likelihood = -6159.2972  
Iteration 6:   log likelihood = -6159.2972  

Random-effects Poisson regression                   Number of obs    =     882
Group variable: PERMNO                              Number of groups =     124

Random effects u_i ~ Gamma                          Obs per group:
                                                                 min =       1
                                                                 avg =     7.1
                                                                 max =       9

                                                    Wald chi2(38)    = 8534.53
Log likelihood = -6159.2972                         Prob > chi2      =  0.0000

----------------------------------------------------------------------------------------------------
                           patents | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-----------------------------------+----------------------------------------------------------------
                      xrdintensity |  -1.120653   .1517961    -7.38   0.000    -1.418168   -.8231378
                                   |
                   Numberofmergers |
                               L1. |   -.102641   .0091917   -11.17   0.000    -.1206563   -.0846257
                                   |
 c.xrdintensity#cL.Numberofmergers |   1.120783   .0937794    11.95   0.000     .9369785    1.304587
                                   |
                      xrdintensity |          0  (omitted)
                                   |
                   Numberofmergers |
                               L2. |  -.0542042   .0071142    -7.62   0.000    -.0681479   -.0402605
                                   |
c.xrdintensity#cL2.Numberofmergers |   .5036948   .0660602     7.62   0.000     .3742193    .6331704
                                   |
                            ln_emp |   1.277409   .0202313    63.14   0.000     1.237756    1.317062
                                   |
                     YearEffective |
                             2003  |   .0702216   .0138705     5.06   0.000     .0430359    .0974073
                             2004  |   .0638197   .0138225     4.62   0.000      .036728    .0909113
                             2005  |   .0238386    .014322     1.66   0.096    -.0042319    .0519091
                             2006  |   .2299213   .0147869    15.55   0.000     .2009395    .2589031
                             2007  |   .0705804   .0153488     4.60   0.000     .0404974    .1006634
                             2008  |   .1167531   .0145884     8.00   0.000     .0881604    .1453458
                             2009  |   .2626372    .013854    18.96   0.000      .235484    .2897905
                             2010  |   .2741835   .0140068    19.57   0.000     .2467306    .3016364
                                   |
            AcquirorPrimarySICCode |
                             2836  |  -.2427185   .6881567    -0.35   0.724    -1.591481    1.106044
                             3571  |   .0966308   1.071207     0.09   0.928    -2.002897    2.196159
                             3572  |   .6370681   .8230904     0.77   0.439    -.9761593    2.250296
                             3577  |   .3789265   .5995018     0.63   0.527    -.7960753    1.553928
                             3661  |   .6311128   .5440341     1.16   0.246    -.4351745      1.6974
                             3663  |   .6227413   .8210935     0.76   0.448    -.9865723    2.232055
                             3669  |   .2808093   1.073805     0.26   0.794     -1.82381    2.385429
                             3672  |  -5.520446    1.12872    -4.89   0.000    -7.732695   -3.308196
                             3674  |    1.85173   .4994594     3.71   0.000     .8728075    2.830652
                             3812  |  -.2203885   1.077981    -0.20   0.838    -2.333192    1.892415
                             3826  |  -.0476256   1.088734    -0.04   0.965    -2.181505    2.086254
                             3829  |    -1.7375   .8276111    -2.10   0.036    -3.359588   -.1154126
                             3841  |   .0799186   .6633207     0.12   0.904    -1.220166    1.380003
                             3845  |  -1.256542   .8694582    -1.45   0.148    -2.960649    .4475648
                             4812  |   .6746673   1.075653     0.63   0.531    -1.433573    2.782908
                             4813  |  -.5512695   .8908221    -0.62   0.536    -2.297249     1.19471
                             7371  |    .003051   1.074686     0.00   0.998    -2.103295    2.109397
                             7372  |   -.176467   .4689757    -0.38   0.707    -1.095643    .7427085
                             7373  |   -1.58726   .6822282    -2.33   0.020    -2.924403   -.2501176
                             7374  |   .5434656   1.074599     0.51   0.613    -1.562709     2.64964
                             7375  |  -1.113326   .8753653    -1.27   0.203     -2.82901    .6023589
                             7376  |  -.2579278   1.090177    -0.24   0.813    -2.394635    1.878779
                             7379  |  -.3651181   1.074338    -0.34   0.734    -2.470781    1.740545
                             8731  |   2.474713   1.070854     2.31   0.021     .3758772    4.573548
                                   |
                             _cons |   .6952956   .4479684     1.55   0.121    -.1827063    1.573298
-----------------------------------+----------------------------------------------------------------
                          /lnalpha |  -.0580438   .1282502                     -.3094096     .193322
-----------------------------------+----------------------------------------------------------------
                             alpha |   .9436086    .121018                      .7338801    1.213273
----------------------------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 2.7e+04              Prob >= chibar2 = 0.000

Best regards,

Chris

Last edited by Christopher Weber; 06 Dec 2021, 12:39.

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#5

06 Dec 2021, 13:51

The cause has to do with your use of lag operators for number of mergers. In the first case, 2nd degree lags are not defined for Perm no. = 90215, hence a missing value and thus being dropped from estimation. -fillin- fills in whatever combination of number of mergers that were specified, resulting in a non-missing value for that Perm no. and then it became included. You should ask yourself whether you really should have data for that Perm no. for those years added by -fillin-.
1 like
Comment

Announcement

e(sample) function

Comment

Comment

Comment

Comment