Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Losing observations after predict

    Dear all,

    I am running a multilevel model with over 40.000 observations, but when I use the command 'predict if e(sample)' I lose roughly 37.000 of them. Anyone that can tell me how that is possible?

    Many thanks in advance,

    Kris

  • #2
    Do you lose them at predict or are they already missing in your model? When you add "if e(sample)", the prediction is only made for observations that were regarded in your original model. You can easily check the number of observations there.
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Hi Felix,

      No, I lose them at predict, not in my original model. Thats what I find so strange. The number of observations in my model is way more than the no. of observations I get when I use predict and I dont understand why.

      Any idea?

      Comment


      • #4
        What model are you running? Are there any issues like missing values, collinearity or so? Are there effects estimated for every variable in your model? As the manual tells you there are many options for predict in ML settings so more details might be helpful --> mixed postestimation
        Last edited by Felix Bittmann; 27 Nov 2019, 07:55.
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          Thanks. I am running an xtmelogit model (I use Stata 13).

          This is the command I run:

          xtmelogit depvar indvar controlvars || country: indvar, mle var cov(unstr)

          If you have an idea, it would be VERY much appreciated.

          Comment


          • #6
            We don't expect you to show a dataset your size, but even the syntax in #5 seems schematic rather than literal. You could and should show the command you issued and the output given to you by xtmelogit followed by the predict command you used. See also FAQ Advice #12.

            A possible problem here is that e(sample) refers to the last model fitted, so that in practice you should follow the model fit immediately by predict.

            Comment


            • #7
              Hello,

              Thanks for your help.

              So I run this model:

              xtmelogit q37binnew q45 level3 se2 se3a se5 se9 q43 elections fh gdppercapita corrupt pts || country: q45, mle var cov(unstr)


              Mixed-effects logistic regression Number of obs = 40416
              Group variable: country Number of groups = 16

              Obs per group: min = 886
              avg = 2526.0
              max = 4486

              Integration points = 7 Wald chi2(12) = 763.57
              Log likelihood = -24488.873 Prob > chi2 = 0.0000

              ------------------------------------------------------------------------------
              q37binnew | Coef. Std. Err. z P>|z| [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              q45 | -.0753204 .0334112 -2.25 0.024 -.1408052 -.0098356
              level3 | -.1690492 .0244839 -6.90 0.000 -.2170369 -.1210616
              se2 | .1705215 .0228523 7.46 0.000 .1257319 .2153111
              se3a | .0063606 .0008598 7.40 0.000 .0046754 .0080458
              se5 | -.0068202 .0121726 -0.56 0.575 -.030678 .0170377
              se9 | .0488059 .0238373 2.05 0.041 .0020856 .0955262
              q43 | .0381727 .0115362 3.31 0.001 .0155623 .0607832
              elections | .0995321 .0335796 2.96 0.003 .0337173 .1653468
              fh | -.3449709 .0921981 -3.74 0.000 -.5256758 -.164266
              gdppercapita | -.0005423 .0000777 -6.98 0.000 -.0006945 -.0003901
              corrupt | 2.503162 .1190163 21.03 0.000 2.269894 2.73643
              pts | -.1279449 .0639624 -2.00 0.045 -.2533088 -.002581
              _cons | -2.604731 .9337329 -2.79 0.005 -4.434814 -.7746484
              ------------------------------------------------------------------------------

              ------------------------------------------------------------------------------
              Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
              -----------------------------+------------------------------------------------
              country: Unstructured |
              var(q45) | .0158934 .0068146 .0068588 .0368288
              var(_cons) | 8.940206 3.695588 3.976387 20.10048
              cov(q45,_cons) | -.2280286 .1201598 -.4635374 .0074801
              ------------------------------------------------------------------------------
              LR test vs. logistic regression: chi2(3) = 3062.52 Prob > chi2 = 0.0000


              Then I do:

              predict mu if e(sample)

              (53432 missing values generated)
              (option mu assumed; predicted means)

              sum mu

              Variable | Obs Mean Std. Dev. Min Max
              -------------+--------------------------------------------------------
              mu | 3017 .6675089 .2176398 .0060365 .829378


              So I go from 40416 obs to 3017 observations suddenly.

              Comment


              • #8
                Thanks for the evidence. I am no kind of expert on these models but others may be able to comment or follow with further precise questions.

                Comment


                • #9
                  Thanks Nick, I hope others can hep me out.

                  Comment


                  • #10
                    Maybe you can try to estimate a minimal model first, so only one or a few independent variables, and remove the random effect. By building a model in this stepwise fashion you can probably see when the problem occurs and focus on this aspect. Also I see that you run xtmelogit, which is an older version. Maybe there is a bug here? Can you test it with Stata 15 or 16? Are all patches installed? The new command is melogit.
                    Best wishes

                    (Stata 16.1 MP)

                    Comment

                    Working...
                    X