biased coefficients

Prathvajeeth Rajmohan

Join Date: Aug 2017

Posts: 70
#1

biased coefficients

02 Sep 2017, 13:48

hi there more of an econometrics question generally rather than stata specifically for a novice here

I see bias of coefficients reffered to a lot.

In very simple terms does biased coefficients mean there sign/magnitude is incorrect? Thanks
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29913
#2

02 Sep 2017, 14:17

In very simple terms does biased coefficients mean there sign/magnitude is incorrect?

That's an oversimplification, and the term "biased coefficient" is itself an abuse of language.

Nearly every coefficient is "incorrect" because we have only finite information from a data sample and there is always sampling error.

The correct terminology is not biased coefficient but that the estimation of the coefficient was biased.

In very general terms, suppose we have a population that is characterized by some numerical attribute that we would like to know. Let's call that value beta. The statistical term is that beta is the estimand, that which is to be estimated. We attempt to estimate beta by inferring its value from what we can observe in a sample from the population. Of course, the process of sampling is subject to random (or, in the real world, both random and systematic (ugh!) variation). Anyway, we get our sample and then do some kind of calculations on the sample and come out with an estimate of beta, let's call it b. Now, it is clear that b is a random variable: at the very least it incorporates variation due to the random aspect of sampling, and it may incorporate additional variation from systematic sources, and even the calculations might be partly indeterminate. We can imagine that we repeat this exercise repeatedly and get a bunch of different estimates b1, b2, b3, ..., bn, ... etc. These b's are all draws from the distribution of the random variable that our sampling and calculations generate. That distribution is known as the sampling distribution of the estimation process. Note that, in general, these b's will differ from each other and nearly all of them will differ from beta.

The estimation process is called unbiased if E(b) = beta, and biased otherwise.

A shorter way of saying this is that the estimation is biased if the long run average of estimates obtained by repeatedly resampling and re-estimating is not the true value of the population parameter.

So an unbiased estimate is not necessarily accurate. In fact it is almost always not correct. All one can say is that a series of unbiased estimates will, on average, be correct. This is like the two statisticians who went out hunting for some deer. They spot one in the distance and both shoots. One of them misses by 50 yards to the left, and the other misses by 50 yards to the right. They celebrate their triumph because, on average, they got it! That is all an unbiased estimate is.

A biased estimate may sometimes be preferable to an unbiased estimate if the associated estimation process has smaller variation than an unbiased process, even though its average result is off a small bit. Think of two dart players. One of them, for whatever reason, seems to systematically hit about an inch to the left of the center target, but is always within an inch of that mark. The other one throws wildly and, although his shots are equally likely to be too far left as too far right, and too far up as too far down (and therefore, on average, making the center of the target) but seldom comes closer than 3 inches from the center of the target. The first dart player wins nearly every game.

Last edited by Clyde Schechter; 02 Sep 2017, 14:21.
1 like
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#3

02 Sep 2017, 18:48

Since you refer to your question as an econometrics question, let me give the perspective of an econometrician.

A coefficient is unbiased if it is consistent. That means that if your coefficient converges to the true parameter as your observations go to infinity it is unbiased.

An ols model will yield an unbiased estimate of a parameter if there is no omitted variable bias or simultaneity. That means the variable is not correlated with other variables that are correlated with your dependent variable, and that the dependent variable does not cause variation in the independent variable in a reverse or cyclical.relationship. These violations of ols assumptions are called endogeneity.

In practice, the economist assumes every independent variable is endogenous, and will not accept an estimate that is produced without causal inference methods. According to these assumptions, an estimate is biased unless it was produced by a randomized experiment, a difference in differences design, a regression discontinuity design, an instrumental variables estimate or a contextual argument of strict exogeneity. A coefficient produced with one of these methods is said to be identified, and this is the standard of econometrics research. Without
one of these frameworks, most economists will ignore your model, and it will be unpublishable in an econ journal except in very rare cases.

Last edited by Philip Gigliotti; 02 Sep 2017, 18:58.
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4427
#4

02 Sep 2017, 19:01

do econometricians really believe that consistency equals lack of bias? there is something very strange there as consistent but biased estimators have been known to exist for many years (e.g., ridge regression)
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#5

02 Sep 2017, 19:07

I'm sure some of the more cerebral people have a nuanced perspective on that. I'm just a grad student, but to me it seems the incentives are all aligned to find a natural experiment and call that parameter identified to get published somewhere respectable. I was at Donald Rubin's causal inference workshop at Northwestern this summer and I certainly didn't hear any discussion of consistent estimates remaining biased.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29913
#6

02 Sep 2017, 19:55

Just to give a simple example of an biased but consistent estimator, consider this:

Estimand: variance of a normal distribution with unknown mean.

Estimation process: Sample random sample. Estimator = Sum(x_i - sample mean)² / N.

This estimator is biased but consistent. It's expected value is too small by a factor of (N-1)/N, which is why we usually use the formula with N-1 in the denominator. But in the limit as N -> infinity it converges to the true value.

Here's another, that's simpler and doesn't really require much calculation to prove:

Estimand: The upper limit of a uniform distribution with known lower and unknown upper limit.

Estimation process: Simple random sample. Estimator: max x_i

Again, this estimator is clearly biased downward. But in the limit as N -> infinity, it is right on the nose, hence consistent.

The practices of the econometrics community that Philip Gigliotti describes are well known to most who follow this Forum regularly. Suffice it to say that these practices are not universally regarded as ideal. Some would describe the insistence on the use of unbiased estimation statistics as excessive, one which sometimes leads to shoehorning data into inappropriate models because a more appropriate model lacks an unbiased estimator (but still may provide accurate estimates).

In the broader statistical community there is a greater willingness to trade off different desirable properties of estimators, of which unbiasedness and consistency are only two. I would also say that in my occasional reading in the econometric literature (and it is very occasional, and probably not at all representative of the field) I have noticed that for all the concern about biased estimator formulas, there is sometimes a stunning indifference to, or lack of awareness of, bias introduced through the sampling process or through measurement procedures.
1 like
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#7

05 Sep 2017, 14:12

Let's differentiate between definitions and practices.

As a matter of detinition, there is no doubt that there is a difference between bias and consistency. "An unbiased estimator of a parameter is one with a mathematical expectation that equals the true parameter value..The property of consistency ensures that the estimation rule will produce an estimate that is close to the true parameter value with high probability if the sample size is large enough" (Judge, Hill, Griffiths, Lutkepohl, Lee, Introduction to the Theory and Practice of Econometrics, 2nd ed, p. 69, 84.) In other words, with an unbiased estimator the expected value of the parameter value equals the true. In consistent estimators, the parameter estimate goes to the true as the sample goes to infinity. That is why we have some estimators with small sample properties and some with just large sample properties. It is also why the derivation of properties for OLS looks quite different from the derivation of properties for estimators with only large sample properties. This is simply using the terms correctly.

I also find the statement that "the economist assumes every independent variable is endogenous" truly bizarre. For example, while my physical sex at birth may influence many things in my life, I fully believe it is measured without error and is not influenced by the later activities it may explain. Many of the tools that econometricians emphasize currently are based on beliefs that certain independent variables (like specific events) are exogenous. Experimental treatments are ways the experimenter produces exogenous variables. In a world where everything is endogenous, empirical work is almost impossible - perhaps a good thing if one doesn't want to test one's theories.

There are a variety of debates over differences in research approach and which estimators and which properties are more desirable, with different disciplines taking different approaches. All approaches have their blind spots. For example, Alice Rivlin (former Vice-Chair of Fed Reserve, Director of US Office of Management and Budget, and first head of the Congressional Budget Office) noted that building fancy models for bad data was the apex of the profession while collecting better data was considered hack work. While some areas I have worked with are quite naive about their estimators but careful about measurement, others use complex estimators but are quite sloppy about measurement.

As George Box is quoted as saying "All models are wrong but some are useful". It almost inconceivable that we can fully model the understanding of someone who has a detailed understanding of a system. Some modesty is appropriate.
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#8

05 Sep 2017, 15:45

Originally posted by Phil Bromiley View Post

Let's differentiate between definitions and practices.

As a matter of detinition, there is no doubt that there is a difference between bias and consistency. "An unbiased estimator of a parameter is one with a mathematical expectation that equals the true parameter value..The property of consistency ensures that the estimation rule will produce an estimate that is close to the true parameter value with high probability if the sample size is large enough" (Judge, Hill, Griffiths, Lutkepohl, Lee, Introduction to the Theory and Practice of Econometrics, 2nd ed, p. 69, 84.) In other words, with an unbiased estimator the expected value of the parameter value equals the true. In consistent estimators, the parameter estimate goes to the true as the sample goes to infinity. That is why we have some estimators with small sample properties and some with just large sample properties. It is also why the derivation of properties for OLS looks quite different from the derivation of properties for estimators with only large sample properties. This is simply using the terms correctly.

I also find the statement that "the economist assumes every independent variable is endogenous" truly bizarre. For example, while my physical sex at birth may influence many things in my life, I fully believe it is measured without error and is not influenced by the later activities it may explain. Many of the tools that econometricians emphasize currently are based on beliefs that certain independent variables (like specific events) are exogenous. Experimental treatments are ways the experimenter produces exogenous variables. In a world where everything is endogenous, empirical work is almost impossible - perhaps a good thing if one doesn't want to test one's theories.

There are a variety of debates over differences in research approach and which estimators and which properties are more desirable, with different disciplines taking different approaches. All approaches have their blind spots. For example, Alice Rivlin (former Vice-Chair of Fed Reserve, Director of US Office of Management and Budget, and first head of the Congressional Budget Office) noted that building fancy models for bad data was the apex of the profession while collecting better data was considered hack work. While some areas I have worked with are quite naive about their estimators but careful about measurement, others use complex estimators but are quite sloppy about measurement.

As George Box is quoted as saying "All models are wrong but some are useful". It almost inconceivable that we can fully model the understanding of someone who has a detailed understanding of a system. Some modesty is appropriate.

How could gender not be endogenous? I tend to think of demographic variables as the most endogenous covariates. If you regress a dependent variable on gender, you have no way of knowing whether the observed relationship is a product of the exogenous effect of gender or the infinite physical and social traits that are correlated with gender.

This is the basis of controversy over wage gap between men and women. If you regress wage on female gender you will likely observe a large negative effect that is highly significant. But if you control for marriage, children, and time away from work, the wage gap disappears. This is a textbook case of omitted variable bias.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29913
#9

05 Sep 2017, 15:55

This is a textbook case of omitted variable bias.

Yes it is. But sex is still exogenous. These are different concepts. An endogenous variable is one whose value is (or is believed to be) dependent upon the other variables (and especially the dependent variable or the error term) in the model.

If sex were endogenous, that would mean that in response to the higher wages paid to men, women changed their gender and became men. We do seem to see more people undergoing gender re-assignment surgery these days, but as far as I know the changes occur about equally in both directions, and nobody has suggested that the wage gap is driving this. Indeed, the wage gap has declined some over the past decades while gender reassignments have increased.
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#10

05 Sep 2017, 16:21

Originally posted by Clyde Schechter View Post

Yes it is. But sex is still exogenous. These are different concepts. An endogenous variable is one whose value is (or is believed to be) dependent upon the other variables (and especially the dependent variable or the error term) in the model.

If sex were endogenous, that would mean that in response to the higher wages paid to men, women changed their gender and became men. We do seem to see more people undergoing gender re-assignment surgery these days, but as far as I know the changes occur about equally in both directions, and nobody has suggested that the wage gap is driving this. Indeed, the wage gap has declined some over the past decades while gender reassignments have increased.

In econometrics, that's called simultaneity. Endogeneity includes any correlation with the error term, including simultaneity, omitted variable bias and measurement error.

from Wikipedia:

In econometrics, an endogeneity problem occurs when an explanatory variable is correlated with the error term.^[1] Endogeneity can arise as a result of measurement error, autoregression with autocorrelated errors, simultaneous causality (see Instrumental variable) and omitted variables. Two common causes of endogeneity are: 1) an uncontrolled confounder causing both independent and dependent variables of a model; and 2) a loop of causality between the independent and dependent variables of a model.

https://en.wikipedia.org/wiki/Endogeneity_(econometrics)
Comment
Philip Gigliotti

Join Date: Nov 2016

Posts: 118
#11

05 Sep 2017, 17:14

In econometrics that is called simultaneity. Endogeneity comprises simultaneity, omitted variable bias, autocorrelation and measurement error. Check the wikipedia page on endogeneity, it won't let me post a link without review.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29913
#12

05 Sep 2017, 22:24

That is interesting; thanks for the reference. I have not seen the term used that way before. It seems an odd definition, throwing together a bunch of rather different situations. But if that is how econometricians use the term, then, at least in the econometrics context, others should use it that way too. I think if used this way in an epidemiology paper it would be misunderstood. I'm less sure about other disciplines.
Comment

Announcement

biased coefficients

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment