To statistician and Stata developers: Is there a bias in margins command?

Angelica Morgan

Join Date: Oct 2020

Posts: 12
#1

To statistician and Stata developers: Is there a bias in margins command?

19 Oct 2020, 08:07

Hi All,

I came across the article below. Appendix 3 (bias in conventional predicted probability formulas) seems to suggest that predicted probabilities and corresponding confidence intervals calculated by margins command (based on beta estimates) are biased. The author proposes simulation as a remedy, which looks cool but I am not fully convinced yet, because the simulation starts with the same beta estimates (to draw a distribution for each beta estimate) and relies on assumptions. Also, margins command uses the delta method to calculate standard errors. The author seems to suggest that confidence intervals produced by simulations are better than confidence intervals calculated using the delta method. If any of you can shed light upon the matters, I would appreciate it, as I am fond of margins command and would like to have more confidence to use it. Thank you.

Zelner, B. A. (2009). Using simulation to interpret results from logit, probit, and other nonlinear models. Strategic Management Journal, 30(12), 1335-1348.

Best,
An
Tags: None
Jeph Herrin

Join Date: Apr 2014

Posts: 335
#2

19 Oct 2020, 09:43

The delta method used by -margins- can produce biased and/or negative CIs, in particular when the predicted value is close to 0 or 1. I typically use simulation, but there is also a -transform_margins- from Jeff Pitblado at Stata, which relies on -margins- to get the linear predictions, and then transforms those. See this thread for details: https://www.statalist.org/forums/for...tic-regression

hth,
Jeph
1 like
Comment
Angelica Morgan

Join Date: Oct 2020

Posts: 12
#3

19 Oct 2020, 14:25

Let's keep this post open. I want to know whether we should be concerned regarding the bias mentioned in Appendix 3 and whether simulation really can avoid the bias?

Jeph, thank you very much for the comment on negative CIs and for introducing me to the transform_margins command and the thread. You are really helpful! I see the shortcoming of the delta method now (by the way, surprised that Stata hasn't fixed the issue yet). Assume we use transform_margins to get the correct CIs, I think the author would still prefer simulation to transform_margins because of the bias in Appendix 3. This is why I leave the post open.
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2402
#4

19 Oct 2020, 14:47

The bias spoken of here is a problem with the delta method itself, and not limited to specific software. The better way to do this if one is close to the boundary of the response is to transform, compute the variance or CI on that scale and then transform back. This is what Jeff's command is doing. If you find yourself in a situation where margins doesn't work for your needs, then you are free to use a different method, but you are expected to know that.
2 likes
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#5

20 Oct 2020, 02:11

The delta method is subject to bias which depends whether it uses a numerical approximation or an analytical approach and whether the confidence intervals are based on asymptotic theory. This is not the only case.
The latter also the reason for the increasing use of bootstrapping.
1 like
Comment
Angelica Morgan

Join Date: Oct 2020

Posts: 12
#6

20 Oct 2020, 07:34

I am still fascinated by one of the questions that hasn't been answered yet. Below, I reframe and simplify the question. I need to figure this out to justify commands such as margins and nlcom.

1. Assume we obtain an "unbiased" BETA estimate after logit and use it to calculate probability 1 through a known formulae (which is a function of BETA). Why is probability 1 a "biased" estimate of probability of the true population?

2. Assume we obtain an "unbiased" BETA estimate after logit, use simulation to draw a distribution of betas with mean equal to the "unbiased" BETA estimate, calculate simulated probabilities for all simulated betas, and then calculate the mean of these simulated probabilities (probability 2). Why is probability 2 an "unbiased" estimate of probability of the true population?

Thank you Leonardo and Eric for further growing my knowledge on standard errors and the delta method.

The delta method is subject to bias which depends whether it uses a numerical approximation or an analytical approach and whether the confidence intervals are based on asymptotic theory.

Eric, if you don't mind, please can you elaborate on "analytical approach", which might be relevant to something I read yesterday (just in case)?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#7

20 Oct 2020, 08:00

This is a good example of why I insist that my students use statistical terms precisely. The word "unbiased" has a very specific meaning: Is the expected value of the estimator equal to the parameter it is supposed to be estimating? An estimator is unbiased if and only if E(b^) = b for all values b. And, of course, in classical statistic the expected value is computed with respect to the sampling distribution.

Applied to the current discussion, the first point is that none of the estimators obtained from probit, logit, or almost any nonlinear model is unbiased. So talking about "unbiased betas" is a non-starter -- even if we put it in quotes. The estimators are consistent under correct model specification. But the vast majority of consistent estimators are not unbiased. Therefore, discussing bias in the fitted probabilities is pretty meaningless. If we put in any values of the covariates then we obtain a consistent estimator of the actual probability.

I think what Angelica is trying to express is that sometimes estimates can be outside the logical range of values. This is very different from bias. In fact, this phenomenon can happen even with an unbiased estimator. For example, if Y has the Uniform[0,b] distribution, two times the sample average, 2*Ybar, is unbiased for b, and yet we may see specific observations above this estimate.

I don't see anything inherently wrong with the usual estimates of the probabilities computed by the margins command. The confidence intervals are a different matter. Applying the delta method to a nonlinear function often leads to a confidence interval that includes impossible values. The delta method uses a linear approximation in obtaining the first-order asymptotic distribution. This is the crux of the issue. If one uses a simulation method -- whether it is sampling different beta hats from its asymptotic distribution or applying the nonparametric bootstrap -- one can obtain logically consistent confidence intervals. In this sense, the simulated CIs can be more appealing. Alternatively, previous posts note that one can use a transformation-retransformation method to ensure the CIs are logically consistent for any sample of data.

JW
5 likes
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#8

20 Oct 2020, 08:25

Most estimators rarely have small sample, or exact, properties. This is what I meant by "based on asymptotic theory". Unbiasedness is a small sample property, meaning that if it holds it holds for any sample size. It is sometimes used as shorthand for "asymptotic unbiasednes" which is also used instead of consistency (the two concepts are not the same). This is what I thought Angelica meant. I was, and am, under the impression that the delta method does not always produce consistent estimators, and that even when it does, the estimators may have large small sample biases.
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2167
#9

20 Oct 2020, 08:37

Eric: If the function of the parameters is continuously differentiable then the delta method produces consistent estimators of the asymptotic variances. So, in practice, it’s almost always consistent, and certainly for logit and probit. That it can lead to weird CIs in a particular sample is a different issue. Asymptotically, the coverage will be 95%. That’s why we need to use language carefully here.
3 likes
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#10

20 Oct 2020, 09:04

Jeff, when I wrote "whether it uses a numerical approximation or an analytical approach", I was referring to the fact that for some time Stata used discrete numerical approximations rather than analytical derivatives. Then, problems arose even if the function is continuously differentiable. I am almost sure that for instance CIs for IRFs from VARs were not reliable.
1 like
Comment
Angelica Morgan

Join Date: Oct 2020

Posts: 12
#11

20 Oct 2020, 09:39

Jeff, glad the this thread captures your attention too. Please can you spend 5 minutes at most and read the two attachments? One is Appendix 3, and the other shows simulation steps. They are screenshots from the paper Zelner (2009). The full paper is too large to upload, and I don't think it is necessary for you to read the whole paper. Me too, "I don't see anything inherently wrong with the usual estimates of the probabilities computed by the margins command". But Zelner (2009) writes that there is a bias, which is different from the CI issue. Personally, I am not sure it is a severe bias. Also, I am not sure simulation can eliminate the bias. These are what I am trying to figure out, together with the delta method issue.

Eric, thanks for defending me, ha ha. Jeff reminds me of the days being kicked to improve academic rigor. Such a fun time.

Particular thanks to Jeff for pointing out the "crux" of the delta method.

If in need, I can send the full paper to any of you through email and other means.
Attached Files

Last edited by Angelica Morgan; 20 Oct 2020, 09:45.
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#12

20 Oct 2020, 09:51

Angelica, these are the kind of texts that do more harm than good. "It is well known that for large enough samples, beta hat is an unbiased estimator of the true population coefficient, beta". Wrong! Even worse illustrating with mu hat and the inverse of mu hat.
beta hat in the logit . probit model is not an unbiased estmator of beta: it is a consistent estimator (if certain assumptions are satisfied). Moreover if beta hat is a consistent estimator of beta, so is 1/beta hat a consistent estimator 1/beta.
1 like
Comment
Angelica Morgan

Join Date: Oct 2020

Posts: 12
#13

20 Oct 2020, 10:16

I would like to keep this post open to hear echoes or different opinions.

if beta hat is a consistent estimator of beta, so is 1/beta hat a consistent estimator 1/beta

This is my inference too. But when scholars say something different, I exercise caution and try to find proof for counter-argument. After all I am not a theorist in statistics, which is an agonizing pain. Also, I would like to be careful when there is a change from linearity to nonlinearity. Eric, is there any references or books we can use to prove it?
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#14

20 Oct 2020, 10:52

I don't have a reference at hand. I studied it many years ago. I don't know whether Jeff Wooldridge's advanced textbook (which he refers to on this list as his MIT textbook) has a chapter on Asymptotic Theory. The result that the inverse of a consistent estimator is also consistent (subject to regularity conditions) is an application of what is known as Slutsky's Theorem in probability and statistics.
On Edit: Jeff Wooldridge's book does have a chapter on short chapter on Asymptotic Theory and there is a discussion of Slutsky's Theorem.

Last edited by Eric de Souza; 20 Oct 2020, 11:01.
1 like
Comment
Angelica Morgan

Join Date: Oct 2020

Posts: 12
#15

20 Oct 2020, 13:38

This thread still needs inputs, with no limit on time. Many thanks.

Thank you Eric, I will check asymptotic theory and Slutsky's theorem.
Comment

Announcement

To statistician and Stata developers: Is there a bias in margins command?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment