Losing observations after predict

Kris Ruijgrok

Join Date: Nov 2019

Posts: 18
#1

Losing observations after predict

27 Nov 2019, 07:18

Dear all,

I am running a multilevel model with over 40.000 observations, but when I use the command 'predict if e(sample)' I lose roughly 37.000 of them. Anyone that can tell me how that is possible?

Many thanks in advance,

Kris
Tags: None
Felix Bittmann

Join Date: Aug 2018

Posts: 616
#2

27 Nov 2019, 07:36

Do you lose them at predict or are they already missing in your model? When you add "if e(sample)", the prediction is only made for observations that were regarded in your original model. You can easily check the number of observations there.

Best wishes

(Stata 16.1 MP)
1 like
Comment
Kris Ruijgrok

Join Date: Nov 2019

Posts: 18
#3

27 Nov 2019, 07:45

Hi Felix,

No, I lose them at predict, not in my original model. Thats what I find so strange. The number of observations in my model is way more than the no. of observations I get when I use predict and I dont understand why.

Any idea?
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 616
#4

27 Nov 2019, 07:52

What model are you running? Are there any issues like missing values, collinearity or so? Are there effects estimated for every variable in your model? As the manual tells you there are many options for predict in ML settings so more details might be helpful --> mixed postestimation

Last edited by Felix Bittmann; 27 Nov 2019, 07:55.

Best wishes

(Stata 16.1 MP)
Comment
Kris Ruijgrok

Join Date: Nov 2019

Posts: 18
#5

27 Nov 2019, 07:58

Thanks. I am running an xtmelogit model (I use Stata 13).

This is the command I run:

xtmelogit depvar indvar controlvars || country: indvar, mle var cov(unstr)

If you have an idea, it would be VERY much appreciated.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#6

27 Nov 2019, 08:09

We don't expect you to show a dataset your size, but even the syntax in #5 seems schematic rather than literal. You could and should show the command you issued and the output given to you by xtmelogit followed by the predict command you used. See also FAQ Advice #12.

A possible problem here is that e(sample) refers to the last model fitted, so that in practice you should follow the model fit immediately by predict.
Comment
Kris Ruijgrok

Join Date: Nov 2019

Posts: 18
#7

27 Nov 2019, 09:18

Hello,

Thanks for your help.

So I run this model:

xtmelogit q37binnew q45 level3 se2 se3a se5 se9 q43 elections fh gdppercapita corrupt pts || country: q45, mle var cov(unstr)

Mixed-effects logistic regression Number of obs = 40416
Group variable: country Number of groups = 16

Obs per group: min = 886
avg = 2526.0
max = 4486

Integration points = 7 Wald chi2(12) = 763.57
Log likelihood = -24488.873 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
q37binnew | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
q45 | -.0753204 .0334112 -2.25 0.024 -.1408052 -.0098356
level3 | -.1690492 .0244839 -6.90 0.000 -.2170369 -.1210616
se2 | .1705215 .0228523 7.46 0.000 .1257319 .2153111
se3a | .0063606 .0008598 7.40 0.000 .0046754 .0080458
se5 | -.0068202 .0121726 -0.56 0.575 -.030678 .0170377
se9 | .0488059 .0238373 2.05 0.041 .0020856 .0955262
q43 | .0381727 .0115362 3.31 0.001 .0155623 .0607832
elections | .0995321 .0335796 2.96 0.003 .0337173 .1653468
fh | -.3449709 .0921981 -3.74 0.000 -.5256758 -.164266
gdppercapita | -.0005423 .0000777 -6.98 0.000 -.0006945 -.0003901
corrupt | 2.503162 .1190163 21.03 0.000 2.269894 2.73643
pts | -.1279449 .0639624 -2.00 0.045 -.2533088 -.002581
_cons | -2.604731 .9337329 -2.79 0.005 -4.434814 -.7746484
------------------------------------------------------------------------------

------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
country: Unstructured |
var(q45) | .0158934 .0068146 .0068588 .0368288
var(_cons) | 8.940206 3.695588 3.976387 20.10048
cov(q45,_cons) | -.2280286 .1201598 -.4635374 .0074801
------------------------------------------------------------------------------
LR test vs. logistic regression: chi2(3) = 3062.52 Prob > chi2 = 0.0000

Then I do:

predict mu if e(sample)

(53432 missing values generated)
(option mu assumed; predicted means)

sum mu

Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
mu | 3017 .6675089 .2176398 .0060365 .829378

So I go from 40416 obs to 3017 observations suddenly.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#8

27 Nov 2019, 09:56

Thanks for the evidence. I am no kind of expert on these models but others may be able to comment or follow with further precise questions.
Comment
Kris Ruijgrok

Join Date: Nov 2019

Posts: 18
#9

27 Nov 2019, 10:25

Thanks Nick, I hope others can hep me out.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 616
#10

28 Nov 2019, 01:48

Maybe you can try to estimate a minimal model first, so only one or a few independent variables, and remove the random effect. By building a model in this stepwise fashion you can probably see when the problem occurs and focus on this aspect. Also I see that you run xtmelogit, which is an older version. Maybe there is a bug here? Can you test it with Stata 15 or 16? Are all patches installed? The new command is melogit.

Best wishes

(Stata 16.1 MP)
Comment

Announcement

Losing observations after predict

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment