Loss of observations when using eregress/heckman but not when using probit - differences in first stage

Christopher Hartwell

Join Date: Sep 2015

Posts: 8
#1

Loss of observations when using eregress/heckman but not when using probit - differences in first stage

22 Mar 2025, 03:08

Hi all,

I've been searching this forum furiously, consulting the Stata manuals, etc., but cannot find an answer to this question. Something is happening under the hood of the commands heckman/eregress in my selection model that is causing a loss of observations but I cannot figure out what it is.

The problem in a nutshell: I have a heckman selection model that I can replicate in eregress with a selection equation. The selection equation is instances of political violence, where I want to model if political violence occurs in a place or not, so Violence = set of covariates x1-x8. Then this chooses the instances of political violence for the second stage, where I put it against a financial indicator.

When I do this, I receive 985 observations in the probit, which then results in a selected n of about 231.

HOWEVER, when I do the heckman "by hand," running the exact same probit (Violence = x1-x8) it gives me 104 more observations. This of course changes the second-stage considerably when I run OLS by hand.

I have pared down the covariates to the absolute minimum, still a loss of observations between heckman/eregress and probit.

I have summarized the variables and they all have similar availability.

I have tried everything I could think of but cannot figure out what is going on under the hood of heckman/eregress to drop 104 observations consistently that are retained in probit. Is there any diagnostic I can run (I've already studied the first stage of the heckman to death) to figure out which observations are dropped and why?

Thanks!!!
Tags: None
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2120
#2

22 Mar 2025, 07:57

I'm at a loss without seeing what you typed and what Stata reported, including seeing a sample of your data using -dataex-. It seems possible you have missing data on some x variables but I can't be sure.
Comment

Christopher Hartwell

Join Date: Sep 2015
Posts: 8

22 Mar 2025, 09:13

Hi Jeff,

The funny thing is that Stata doesn't report anything. Using ChatGPT and a few other sources, I was able to unpack which observations were missing. But I still have no idea WHY they are dropped from heckman and not probit. For examples, I run the two following codes using my variable names

Code:

probit Y  l.Y X1-X8, vce(r)
gen in_probit = e(sample)
heckman Y2 X X1-X8   , select( Y = l.Y X1-X8 ) vce(r) first difficult
gen in_heckman = e(sample)

And then I try this

Code:

list Y = l.Y X1-X8 Y2 if in_probit == 1 & in_heckman == 0

And I get a list of 81 observations that were dropped in the heckman but I cannot figure out why.

FYI, here are the dropped observations

	Y	l.Y	X1	X2	X3	X4	X5	X6	X7	X8
26	1	1	-2.00E-08	0	0	0	1	9.99072	0.0013201	5.333
64	1	1	-0.0124775	0	0	0	1	9.805324	0.0018018	5.038
69	1	1	0.0134961	0	0	0	1	9.805324	-0.0008756	5.141
76	1	0	0.013176	0	0	0	1	9.847922	0.0005184	5.063
90	1	1	0.0060672	0	0	0	1	10.56939	-0.0022686	5.076
98	1	1	1.06E-08	0	0	0	1	10.56939	-0.0006499	5
104	1	0	0.00597	0	0	0	1	9.725915	0.0028947	4.975
108	1	1	0.0059349	0	0	0	1	9.725915	0.0026726	4.975
112	1	0	0.0058653	0	0	0	1	10.48288	0.0008699	4.739
117	1	1	-0.002972	0	0	0	1	10.48288	-0.0006554	4.751
120	1	0	0.0175956	0	0	0	1	10.48288	-0.0007727	5.495
133	1	1	0.0060062	0	0	0	1	11.12551	0.0001583	5.722
135	1	0	0.0122701	0	0	0	1	11.12551	0.0009976	5.45
144	1	1	0.0178044	0	0	0	1	11.08526	0.0030622	5.208
159	1	1	-0.00597	1	0	0	1	10.98802	-0.0008081	4.926
165	1	1	-2.50E-08	1	0	0	1	9.953135	0.0002673	4.74
169	1	0	0.0175956	1	0	0	1	9.953135	-0.0006319	4.69
177	1	1	0.00597	1	0	0	1	9.024734	-0.00277	4.6
189	1	1	4.94E-09	0	0	0	1	9.741145	0.0005847	4.55
192	1	0	0.0059349	0	0	0	1	9.741145	0.0005286	4.65
201	1	1	-0.0060792	0	0	-0.00663	1	10.23117	-0.0006262	4.55
204	1	1	-2.43E-08	0	0	-0.02134	1	10.23117	-0.0003187	4.55
208	1	1	0.0060792	0	0	-0.007931	1	10.69363	0.0002146	4.5
225	1	0	0.0057437	1	0	0	1	10.96647	0.0022734	4.43
236	1	1	0.0026281	0	0	0	1	11.41261	-0.000878	4.35
244	1	0	0.0078024	0	0	0	1	9.587817	0.0001294	4.44
249	1	1	0.0026281	0	0	0	1	9.587817	0.0012408	4.42
250	1	1	-0.0026281	0	0	-0.000965	1	9.587817	0.0012379	4.42
253	1	1	-0.0026144	0	0	0	1	9.587817	0.0007695	4.42
262	1	1	2.54E-09	0	0	0	1	9.3299	-0.0007848	4.35
265	1	0	0	0	0	0.001931	1	9.3299	-0.0004233	4.35
271	1	1	-0.0026846	0	0	0	1	9.453757	0.0002984	4.33
276	1	1	0.0053051	0	0	0.000966	1	9.453757	0.0001436	4.34
282	1	1	0.002656	0	0	0	1	9.516795	0.0000913	4.26
283	1	1	0.002649	0	0	0	1	9.516795	0.0001108	4.25
285	1	1	5.06E-09	0	0	0	1	9.516795	0.0000792	4.21
289	1	0	0.0026281	0	0	0	1	9.516795	-0.0005441	4.21
298	1	1	1.25E-08	0	0	0	1	9.754756	-0.0025223	4.23
302	1	0	-0.0026212	0	0	0	1	9.754756	-0.0012421	4.48
312	1	1	-0.0039973	0	0	0	1	9.781151	0.0021107	4.5
314	1	0	1.25E-08	0	0	0	1	9.781151	0.0015554	4.49
320	1	1	-0.0026281	0	0	0	1	10.31917	-0.0006214	4.55
324	1	1	-0.0039973	0	0	0	1	10.31917	-0.0003268	4.76
326	1	1	-0.0026846	0	0	0	1	10.31917	0.0004268	4.69
331	1	1	-0.0791373	3	0	0	1	11.19313	0.0023777	5.56
336	1	1	0.0306858	3	0	0	1	11.19313	0.0019041	4.99
349	1	1	-1.81E-08	0	0	0	1	10.05316	-0.0016402	4.63
351	1	0	-1.81E-08	0	0	0	1	10.05316	-0.0008907	4.54
399	1	1	-1.29E-08	0	0	0	1	10.48531	0.0043546	4.45
419	1	1	-0.0057307	0	0	0	1	9.760483	0.003801	4.93
436	1	1	3.16E-08	0	0	0	1	10.97181	-0.0084768	4.6
472	1	1	0.0059348	0	0	0	1	11.02331	0.0002565	4.54
475	1	0	2.29E-08	0	0	0	1	11.02331	0.0007588	4.63
495	1	1	-0.0279088	0	0	0.034293	0	11.07554	-0.0006749	5.08
499	1	0	0.0062696	0	0	0.014597	0	11.15892	-0.0005387	5.06
502	1	0	-0.0030349	0	0	0.009797	0	11.15892	-0.0002446	5.16
527	1	1	-0.0097245	0	0	-0.203244	0	10.79366	-0.0018678	5.43
534	1	0	0.0468313	0	0	-0.033527	0	10.96685	-0.0003336	5.6
540	1	0	-0.0066007	0	0	0.012884	0	10.96685	0.0000945	5.49
541	1	1	-0.0200676	0	0	0.014389	0	10.96685	9.27E-06	5.62
542	1	1	0.013423	0	0	-0.022948	0	10.96685	-0.0000912	5.59
551	1	1	0.0938187	0	0	-0.011134	0	11.09985	-0.0002467	5.59
559	1	1	-0.0332256	0	0	0.012783	0	11.26846	0.0001442	5.74
572	1	0	-2.41E-08	0	0	0.004498	0	11.5571	0.0003074	5.9
582	1	0	-0.0190482	0	0	0.017864	0	11.29076	0.0003823	5.78
658	1	1	-0.0031008	0	0	0.022666	0	12.29425	-0.0002232	4.81
671	1	0	0.0098201	0	0	-0.0011	0	12.13862	-0.0002072	5.46
675	1	0	-0.0350914	0	0	-0.005993	0	12.13862	-0.0002037	6.33
686	1	1	0.0659579	0	0	-0.008606	0	12.25384	0.0001404	6.54
689	1	0	0.040822	0	0	-0.008639	0	12.50691	0.0003608	6.06
800	1	0	3.67E-08	0	0	0	0	12.3593	-0.0001734	5.21
822	1	0	0.0080322	0	0	0	0	12.99962	0.0001745	4.76
848	1	0	3.46E-09	2	0	0	0	12.7341	0.0000247	4.27
851	1	0	-0.0794067	2	0	0	0	12.7341	0.0000492	4.2
860	1	0	0.0576292	1	0	0	0	12.77736	0.0000942	4.31
863	1	0	0.0422004	1	0	0	0	12.77736	0.000089	4.17
878	1	0	0.0081633	0	0	0	0	12.0084	0.0000446	4.1
914	1	0	0	0	0	0	0	12.72456	-0.0001022	3.43
932	1	0	0	0	0	0	0	12.7767	-0.0001603	3.31
941	1	0	0	0	0	0	0	12.82372	-0.0001216	3.33
944	1	0	0	0	0	0	0	12.82372	-0.000105	3.47

Stata tells me nothing about what it dropped in the Heckman, I just know when I do the probit then OLS by hand, I end up with far more observations than in the Heckman, and I cannot for the life of me figure out why these 81 observations - all of which are of interest for the selection equation - are dropped.

Cheers,
CH

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2120
#4

24 Mar 2025, 22:26

What is the variable X? Might it have missing data? As a general rule, it is typically frowned upon to omit X variables from the selection equation. I suspect data are missing on X, but you didn't show data on X or Y2.
Comment

Announcement