Mean Centering

Rachel Sleeps

Join Date: May 2015

Posts: 64
#1

Mean Centering

01 Jun 2015, 00:53

Hi,

I am using Stata 13 to estimate a simple model with an interaction term. To give the interaction term a meaniful interpretation at value zero and to avoid multicollinearity, I am centering variables.
However, the resulting mean is not exactly at zero.

Here is an example.

Code:

clear sysuse auto.dta sum price, meanonly gen cprice = price-r(mean) sum cprice

The mean of cprice is close to zero but deviates from the fith decimal on. Why is that the case? And how can I prevent it?

Thanks
/R
Tags: centering, data transformation, interaction
Maarten Buis

Join Date: Mar 2014

Posts: 3456
#2

01 Jun 2015, 01:59

What you have found is a precision problem. Computers cannot store (fractional) numbers exactly. Think of the number 1/3; if a computer wanted to store that number exactly it would need to store 0.33333333333333...,i.e. it would need to store an infinite number of digits and you can imagine the amount of memory needed to do so. To make it more complex, computers think in terms of binary numbers not decimal numbers. Numbers that are in decimal perfectly simple, can have that 1/3 property in binary. For example 0.1 in decimal is 0.111111111... in binary. Stata (and other computer programs) solve this by rounding. By default numbers are stored as floats with approximately 8 decimal digits. You can increase the precision by creating doubles which are stored with approximately 16 decimal digits.

Code:

clear sysuse auto.dta sum price, meanonly gen cprice1 = price-r(mean) gen double cprice2 = price-r(mean) sum cprice1 cprice2

Having said all that, I typically prefer to center a variable at some meaningful value. For example if we have years of education in the American system, I would center at 12 years of education, corresponding to a highschool degree, or if I have occupational status in the ISEI score I tend to center at 40 corresponding to a skilled worker (e.g. a watchmaker), or if i have year of birth I would typically center that at 1950 or 1960. These are fixed numbers that have meaning to your audience. Talking about highschool graduates, watchmakers or people born in 1960, makes your story more concrete than talking about someone with a mean number of years of education, occupation status or birthyear. Another problem with centering at the mean is that the mean will change from sample to sample making it more difficult to compare different results. To summarize, centering at the mean is not wrong, but you can do better if you have such meaningful values close the middle of the distribution.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
4 likes
Comment
Rachel Sleeps

Join Date: May 2015

Posts: 64
#3

01 Jun 2015, 02:54

Thanks a lot! Great answer.
Comment

Announcement

Comment

Comment