Beta Coefficients

Abdullah Bera

Join Date: Apr 2015

Posts: 3
#1

Beta Coefficients

17 Apr 2015, 02:05

Dear Stata Users,

I would like to obtain beta (standardized) coefficients to compare the magnitude of the effects of independent variables.

We can obtain them in Stata by

reg y x1 x2 x3, beta

It is also possible to first standardize the variables for example by user-written center command and run the regression again.

I have two questions:

1. How should we obtain beta coefficients for panel data? Should I use the pooled mean and standard deviation to calculate standardized variables?

2. When there is another endogenous variable (and panel data) as below, how can I get the beta coefficients?

xtivreg2 y1 (y2=z1 z2) x1 x2 x3, fe bw(2) robust

Many thanks for any help in advance.
Tags: None

daniel klein

Join Date: Mar 2014
Posts: 3824

17 Apr 2015, 02:24

Good questions. I do not have an answer, but I wanted to remind you and others that standardized coefficients do not really help us to compare effects of variables.

A regression coefficient represents the (causal) effect of a given predictor (given certain assumptions are met). This is usually what we are interested in. A standardized regression coefficient gives us a mixture of a (causal) effect and the distribution of a variable (maybe even restricted to our sample). This is usually not what we are interested in. To give an example, say you have two predictors, x and z. Assume the data generating process is

y = 2.5 * x + 2.5 * z + e

Note that x and z have the same "true" (causal) effect on y, which is 2.5. Now assume z has a standard deviation that is twice as large as xs' standard deviation. If you run a regression model of y on x and z, the coefficients for x and z will be identical (they will be close in real data), leading us to conclude that x and z indeed have the same effect, which is 2.5. The standardized coefficient of z, on the other hand, will be twice as large as xs' standardized coefficient, leading us to the wrong conclusion that z has a larger effect on y.

You can see this in this example, I have put together for teaching

Code:

  // the tale of standardization

// create some data
qui {
    clear
    se obs 1000
    se more off
    n di _n(50)
    se more on
    se seed 42
    mat C = (1, 0.3)\ (0.3, 1)
    corr2data x z ,corr(C) sd(2 10) double
    g double e = invnormal(runiform())
}

// we created 1000 observations
// 2 right-hand side variables x and z
// and a random noise e
d
su x z

// note that x and z are meassured on the same scale
// however the standard deviation of z is five times
// the sd of x

// the data generating process is
g double y = 2.5 * x + 2.5 * z + e

// let's say we want to compare the effects of x and z on y
// (ignore for the moment that these kinds of comparisons
// do not make a lot of sense in most situations - we will talk
// about that later)

// we know that both x and z have a "true" causal effect of 2.5

// to estimate (causal) effects we run a linear regression model
more
reg y x z

// the effcects of both x and z are estimated quite close to "truth"
// from the model we would conclude that the effects of x and z on y
// are almost identical
// that is what we would expect

// let's standardize the variables
more
foreach var in x z {
    su `var'
    g double z_`var' = (`var' - r(mean))/r(sd)
}
su z_*
// both of our variables have mean 0 and standard deviation of 1
// now we run the model with the standardised variables
more
reg y z_x z_z

// the "effect" of z_x is estimated to be 4.95
// the coefficient is twice the one estimated for the variable in
// its original scale
// the "effect" of z_z is estimated to be 25
// that is more than five times the coefficient for z_x

// fully standradized coefficients (beta coefficients) do not help
// either
// we can do this "by hand" like this
more
qui su y
g double z_y = (y - r(mean))/r(sd)
reg z_y z_x z_z

// or Stata will do that for us
more
reg y x z ,beta

// so what do we conclude from all of this?
// does z really have an effect on y that is more than five(!)
// times that of x?

// clearly we know that this is not the case
// so how does standardization helps us interpret results?

more
// if you ask me: not one bit
// instead it complicates interpretation because nobody can make
// intutitve sense out of a standard deviation

// the take home message is: think carfully about what exactly a
// comparison means - from a theoretical point of view - and how
// to make such a comparison meaningful

// and that folks, is the story of how standardization DOES NOT
// help us compare effects at all!

You can read more on comparing effects from a more general perspective in King (1986).

Best
Daniel

King, Gary (1986). How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science. American Journal of Political Science, 30: 666-687.

Last edited by daniel klein; 17 Apr 2015, 02:33.

Comment

Abdullah Bera

Join Date: Apr 2015

Posts: 3
#3

17 Apr 2015, 02:50

Daniel,

It seems to me that the point of standardizing variables is to see the effects in terms of standard deviations. This seems to makse sense because for example, assuming all variables are normal, there is approximately 68% probability that we observe a value within one standard deviation around the mean for all variables. So our question is: if we observe a value of an independent variable within one standard deviaion around the mean (whose probability is around 68%), what is the effect of this on y in terms of standard deviations.

In your example, since z has a larger standard deviation, it is natural that it will have a larger affect because it is more likely that we will have larger values of abs(z) than abs(x) in the data set.

What do you think?

Back to my original question:

I tried to standardize variables in the following regression

xtivreg2 y1 (y2=z1 z2) x1 x2 x3, fe bw(2) robust

and then run

xtivreg2 z_y1 (z_y2=z_z1 z_z2) z_x1 z_x2 z_x3, fe bw(2) robust

but the significance of some coefficients change. I think they should not change because of the standardization. So I am assuming that I am doing something incorrectly.

Regards,
A.B.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3824
#4

17 Apr 2015, 03:36

What do you think?

I think there might be rare situations where you are really interested in comparing the effect of a one standard deviation change in z with the effect of one a one standard deviation change in x. However, such comparisons will often translates to statements like: "One standard deviation of number of prisons build has a larger effect on reducing crime rates than one standard deviation of number of police officers employed." But what do you make of such a statement as a politician for example? This might mean: "0.001 more prison would reduce crime rates more than employing 2 new police officers." Still we cannot make much sense of this.

If we instead have statements like : "One more prison reduces crime rates by b1. One additional employed police officer reduces crime rates by b2.", which unstandardized regression coefficients give us directly, we can e.g. compare how much police officers will have the same effect on crime reduction as one new prison. We can then calculate the costs for either option and decide where to best spend our money. I think this is much more straight forward than interpreting standard deviations.

Note, by the way that we were actually comparing variables that are measured on the same scale: number of ... It might be even more confusing if one standard deviation of x has a completely different meaning than one standard deviation of z.

Still no idea on your original question, sorry. I hope someone comes up with one.

Best
Daniel

Last edited by daniel klein; 17 Apr 2015, 03:41.
Comment

Announcement

Comment

Comment

Comment