Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loss of observations when using eregress/heckman but not when using probit - differences in first stage

    Hi all,

    I've been searching this forum furiously, consulting the Stata manuals, etc., but cannot find an answer to this question. Something is happening under the hood of the commands heckman/eregress in my selection model that is causing a loss of observations but I cannot figure out what it is.

    The problem in a nutshell: I have a heckman selection model that I can replicate in eregress with a selection equation. The selection equation is instances of political violence, where I want to model if political violence occurs in a place or not, so Violence = set of covariates x1-x8. Then this chooses the instances of political violence for the second stage, where I put it against a financial indicator.

    When I do this, I receive 985 observations in the probit, which then results in a selected n of about 231.

    HOWEVER, when I do the heckman "by hand," running the exact same probit (Violence = x1-x8) it gives me 104 more observations. This of course changes the second-stage considerably when I run OLS by hand.

    I have pared down the covariates to the absolute minimum, still a loss of observations between heckman/eregress and probit.

    I have summarized the variables and they all have similar availability.

    I have tried everything I could think of but cannot figure out what is going on under the hood of heckman/eregress to drop 104 observations consistently that are retained in probit. Is there any diagnostic I can run (I've already studied the first stage of the heckman to death) to figure out which observations are dropped and why?

    Thanks!!!

  • #2
    I'm at a loss without seeing what you typed and what Stata reported, including seeing a sample of your data using -dataex-. It seems possible you have missing data on some x variables but I can't be sure.

    Comment


    • #3
      Hi Jeff,

      The funny thing is that Stata doesn't report anything. Using ChatGPT and a few other sources, I was able to unpack which observations were missing. But I still have no idea WHY they are dropped from heckman and not probit. For examples, I run the two following codes using my variable names

      Code:
      probit Y  l.Y X1-X8, vce(r)
      gen in_probit = e(sample)
      heckman Y2 X X1-X8   , select( Y = l.Y X1-X8 ) vce(r) first difficult
      gen in_heckman = e(sample)
      And then I try this

      Code:
      list Y = l.Y X1-X8 Y2 if in_probit == 1 & in_heckman == 0
      And I get a list of 81 observations that were dropped in the heckman but I cannot figure out why.

      FYI, here are the dropped observations
      Y l.Y X1 X2 X3 X4 X5 X6 X7 X8
      26 1 1 -2.00E-08 0 0 0 1 9.99072 0.0013201 5.333
      64 1 1 -0.0124775 0 0 0 1 9.805324 0.0018018 5.038
      69 1 1 0.0134961 0 0 0 1 9.805324 -0.0008756 5.141
      76 1 0 0.013176 0 0 0 1 9.847922 0.0005184 5.063
      90 1 1 0.0060672 0 0 0 1 10.56939 -0.0022686 5.076
      98 1 1 1.06E-08 0 0 0 1 10.56939 -0.0006499 5
      104 1 0 0.00597 0 0 0 1 9.725915 0.0028947 4.975
      108 1 1 0.0059349 0 0 0 1 9.725915 0.0026726 4.975
      112 1 0 0.0058653 0 0 0 1 10.48288 0.0008699 4.739
      117 1 1 -0.002972 0 0 0 1 10.48288 -0.0006554 4.751
      120 1 0 0.0175956 0 0 0 1 10.48288 -0.0007727 5.495
      133 1 1 0.0060062 0 0 0 1 11.12551 0.0001583 5.722
      135 1 0 0.0122701 0 0 0 1 11.12551 0.0009976 5.45
      144 1 1 0.0178044 0 0 0 1 11.08526 0.0030622 5.208
      159 1 1 -0.00597 1 0 0 1 10.98802 -0.0008081 4.926
      165 1 1 -2.50E-08 1 0 0 1 9.953135 0.0002673 4.74
      169 1 0 0.0175956 1 0 0 1 9.953135 -0.0006319 4.69
      177 1 1 0.00597 1 0 0 1 9.024734 -0.00277 4.6
      189 1 1 4.94E-09 0 0 0 1 9.741145 0.0005847 4.55
      192 1 0 0.0059349 0 0 0 1 9.741145 0.0005286 4.65
      201 1 1 -0.0060792 0 0 -0.00663 1 10.23117 -0.0006262 4.55
      204 1 1 -2.43E-08 0 0 -0.02134 1 10.23117 -0.0003187 4.55
      208 1 1 0.0060792 0 0 -0.007931 1 10.69363 0.0002146 4.5
      225 1 0 0.0057437 1 0 0 1 10.96647 0.0022734 4.43
      236 1 1 0.0026281 0 0 0 1 11.41261 -0.000878 4.35
      244 1 0 0.0078024 0 0 0 1 9.587817 0.0001294 4.44
      249 1 1 0.0026281 0 0 0 1 9.587817 0.0012408 4.42
      250 1 1 -0.0026281 0 0 -0.000965 1 9.587817 0.0012379 4.42
      253 1 1 -0.0026144 0 0 0 1 9.587817 0.0007695 4.42
      262 1 1 2.54E-09 0 0 0 1 9.3299 -0.0007848 4.35
      265 1 0 0 0 0 0.001931 1 9.3299 -0.0004233 4.35
      271 1 1 -0.0026846 0 0 0 1 9.453757 0.0002984 4.33
      276 1 1 0.0053051 0 0 0.000966 1 9.453757 0.0001436 4.34
      282 1 1 0.002656 0 0 0 1 9.516795 0.0000913 4.26
      283 1 1 0.002649 0 0 0 1 9.516795 0.0001108 4.25
      285 1 1 5.06E-09 0 0 0 1 9.516795 0.0000792 4.21
      289 1 0 0.0026281 0 0 0 1 9.516795 -0.0005441 4.21
      298 1 1 1.25E-08 0 0 0 1 9.754756 -0.0025223 4.23
      302 1 0 -0.0026212 0 0 0 1 9.754756 -0.0012421 4.48
      312 1 1 -0.0039973 0 0 0 1 9.781151 0.0021107 4.5
      314 1 0 1.25E-08 0 0 0 1 9.781151 0.0015554 4.49
      320 1 1 -0.0026281 0 0 0 1 10.31917 -0.0006214 4.55
      324 1 1 -0.0039973 0 0 0 1 10.31917 -0.0003268 4.76
      326 1 1 -0.0026846 0 0 0 1 10.31917 0.0004268 4.69
      331 1 1 -0.0791373 3 0 0 1 11.19313 0.0023777 5.56
      336 1 1 0.0306858 3 0 0 1 11.19313 0.0019041 4.99
      349 1 1 -1.81E-08 0 0 0 1 10.05316 -0.0016402 4.63
      351 1 0 -1.81E-08 0 0 0 1 10.05316 -0.0008907 4.54
      399 1 1 -1.29E-08 0 0 0 1 10.48531 0.0043546 4.45
      419 1 1 -0.0057307 0 0 0 1 9.760483 0.003801 4.93
      436 1 1 3.16E-08 0 0 0 1 10.97181 -0.0084768 4.6
      472 1 1 0.0059348 0 0 0 1 11.02331 0.0002565 4.54
      475 1 0 2.29E-08 0 0 0 1 11.02331 0.0007588 4.63
      495 1 1 -0.0279088 0 0 0.034293 0 11.07554 -0.0006749 5.08
      499 1 0 0.0062696 0 0 0.014597 0 11.15892 -0.0005387 5.06
      502 1 0 -0.0030349 0 0 0.009797 0 11.15892 -0.0002446 5.16
      527 1 1 -0.0097245 0 0 -0.203244 0 10.79366 -0.0018678 5.43
      534 1 0 0.0468313 0 0 -0.033527 0 10.96685 -0.0003336 5.6
      540 1 0 -0.0066007 0 0 0.012884 0 10.96685 0.0000945 5.49
      541 1 1 -0.0200676 0 0 0.014389 0 10.96685 9.27E-06 5.62
      542 1 1 0.013423 0 0 -0.022948 0 10.96685 -0.0000912 5.59
      551 1 1 0.0938187 0 0 -0.011134 0 11.09985 -0.0002467 5.59
      559 1 1 -0.0332256 0 0 0.012783 0 11.26846 0.0001442 5.74
      572 1 0 -2.41E-08 0 0 0.004498 0 11.5571 0.0003074 5.9
      582 1 0 -0.0190482 0 0 0.017864 0 11.29076 0.0003823 5.78
      658 1 1 -0.0031008 0 0 0.022666 0 12.29425 -0.0002232 4.81
      671 1 0 0.0098201 0 0 -0.0011 0 12.13862 -0.0002072 5.46
      675 1 0 -0.0350914 0 0 -0.005993 0 12.13862 -0.0002037 6.33
      686 1 1 0.0659579 0 0 -0.008606 0 12.25384 0.0001404 6.54
      689 1 0 0.040822 0 0 -0.008639 0 12.50691 0.0003608 6.06
      800 1 0 3.67E-08 0 0 0 0 12.3593 -0.0001734 5.21
      822 1 0 0.0080322 0 0 0 0 12.99962 0.0001745 4.76
      848 1 0 3.46E-09 2 0 0 0 12.7341 0.0000247 4.27
      851 1 0 -0.0794067 2 0 0 0 12.7341 0.0000492 4.2
      860 1 0 0.0576292 1 0 0 0 12.77736 0.0000942 4.31
      863 1 0 0.0422004 1 0 0 0 12.77736 0.000089 4.17
      878 1 0 0.0081633 0 0 0 0 12.0084 0.0000446 4.1
      914 1 0 0 0 0 0 0 12.72456 -0.0001022 3.43
      932 1 0 0 0 0 0 0 12.7767 -0.0001603 3.31
      941 1 0 0 0 0 0 0 12.82372 -0.0001216 3.33
      944 1 0 0 0 0 0 0 12.82372 -0.000105 3.47
      Stata tells me nothing about what it dropped in the Heckman, I just know when I do the probit then OLS by hand, I end up with far more observations than in the Heckman, and I cannot for the life of me figure out why these 81 observations - all of which are of interest for the selection equation - are dropped.

      Cheers,
      CH

      Comment


      • #4
        What is the variable X? Might it have missing data? As a general rule, it is typically frowned upon to omit X variables from the selection equation. I suspect data are missing on X, but you didn't show data on X or Y2.

        Comment

        Working...
        X