Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Centering at the mean

    Dear all,
    My question is related to centering at the grand mean. In one of the papers I use as a guideline I found the following:

    'We follow Cohen et al.’s (2003) recommendations to center the industrylevel variable (herfindahl-index) at the grand mean and also center the firm-level variables (cash/assets) by the industry mean when testing their interaction effect (Martin, Cullen, and Parboteeah, 2007)'.

    Does this mean that I have to replace for example Cash/Assets by (Cash/Assets - Average of Cash/Assets)?
    Could someone explain what the statistical reasoning is behind this?

    Kind regards,
    Emiel Brak

  • #2
    In this context, the centering approach is very common in mixed / multilevel / hierarchical models. One of the major advantages of these models is the ability to separate the effect of one variable on different levels and the ability to explain the effect of an independent variable by other independent variables (cross level interactions). That means the intercept and the slopes of lower level unit (e.g. firm level) can be explained by variables of a higher level unit (industry level). To be able to give meaning to the random slopes (the slopes that vary between the units of analysis and become as dependent on a higher level unit variable) it is very common to center its variable to be able to give a meaningful interpretation to the slope. Slopes that vary between the units, should be centered around their mean, and variables that explain those slopes are generally centered around the grand mean.
    Last edited by Oded Mcdossi; 02 Nov 2015, 15:44.

    Comment


    • #3
      I can't think of any context in which you "have to" do centering; there are circumstances where it's helpful and perhaps yours is one of them.

      You don't state what kind of analyses you are planning to do, but the commonest circumstance where one centers variables is regression models with interaction terms (and especially in multilevel models). The rationale is this. If you run an analysis like:

      Code:
      regress y x1 x2 x1#x2 // OR c.x1#c.x2 FOR CONTINUOUS VARIABLES
      the coefficient of x2 in the output is the effect of x2 conditional on x1 = 0. And the coefficient of x1 is the effect of x1 conditional on x2 = 0. Now for many situations the nature of the variables x1 and x2 are such that 0 is not a meaningful value for one or both of them. And for some variables 0 may not even be a possible value. So the coefficients conditioned that way are simply not useful. The solution is to center the variables around some values that are meaningful and useful to condition on. The mean is often such a value. The median is another. Sometimes even the minimum or maximum makes sense. Or in longitudinal data, the first value observed in each panel. There are lots of choices. But the purpose is in all cases to structure the model so that the regression coefficients are interpretable in the context of the domain being addressed. The same considerations apply to regression models generally.

      By the way, citations such as Cohen et al. (2003) or Martin, Cullen, and Parbotteeah 2007 are not helpful on this forum. Many, many different professions are represented in the membership, and only those in your domain or closely related ones will have any idea what you are referring to. If you think it is important that we familiarize ourselves with the contents of these articles, full citations are needed.

      Comment

      Working...
      X