Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Mean Centering

    Hi,

    I am using Stata 13 to estimate a simple model with an interaction term. To give the interaction term a meaniful interpretation at value zero and to avoid multicollinearity, I am centering variables.
    However, the resulting mean is not exactly at zero.

    Here is an example.

    Code:
    clear
    sysuse auto.dta
    
    sum price, meanonly
    gen cprice = price-r(mean)
    
    sum cprice
    The mean of cprice is close to zero but deviates from the fith decimal on. Why is that the case? And how can I prevent it?

    Thanks
    /R


  • #2
    What you have found is a precision problem. Computers cannot store (fractional) numbers exactly. Think of the number 1/3; if a computer wanted to store that number exactly it would need to store 0.33333333333333...,i.e. it would need to store an infinite number of digits and you can imagine the amount of memory needed to do so. To make it more complex, computers think in terms of binary numbers not decimal numbers. Numbers that are in decimal perfectly simple, can have that 1/3 property in binary. For example 0.1 in decimal is 0.111111111... in binary. Stata (and other computer programs) solve this by rounding. By default numbers are stored as floats with approximately 8 decimal digits. You can increase the precision by creating doubles which are stored with approximately 16 decimal digits.

    Code:
    clear
    sysuse auto.dta
    
    sum price, meanonly
    
    gen cprice1 = price-r(mean)
    gen double cprice2 = price-r(mean)
    
    sum cprice1 cprice2
    Having said all that, I typically prefer to center a variable at some meaningful value. For example if we have years of education in the American system, I would center at 12 years of education, corresponding to a highschool degree, or if I have occupational status in the ISEI score I tend to center at 40 corresponding to a skilled worker (e.g. a watchmaker), or if i have year of birth I would typically center that at 1950 or 1960. These are fixed numbers that have meaning to your audience. Talking about highschool graduates, watchmakers or people born in 1960, makes your story more concrete than talking about someone with a mean number of years of education, occupation status or birthyear. Another problem with centering at the mean is that the mean will change from sample to sample making it more difficult to compare different results. To summarize, centering at the mean is not wrong, but you can do better if you have such meaningful values close the middle of the distribution.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Thanks a lot! Great answer.

      Comment

      Working...
      X