Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Centering variables

    Hello,

    I have 16 countries across 30+years. My variables are slightly multi collinear as I have GDP and energy as independent variables.
    How can I center e.g. GDP in STATA? and should I center all variables or just those with high VIF?


  • #2
    Where do you want to center GDP? At the mean? At the median? Somewhere else? Do you want to separately center it for each country? Or just for the 16 countries combined? If you want mean-centering for all 16 countries it would be:
    Code:
    summ gdp
    gen gdp_c = gdp - `r(mean)'
    That said, centering these variables will do nothing whatsoever to the multicollinearity. Centering can relieve multicolinearity between the linear and quadratic terms of the same variable, but it doesn't reduce colinearity between variables that are linearly related to each other. In fact the correlations between the centered variables will be exactly the same as before centering.

    And that said, why are you fixated on slight multi colinearity? Multicolinearity is the single most overrated "problem" in regression. See Goldberger's textbook in economics for a lengthy and entertaining takedown of the entire concept. Here's a short, not so entertaining summary:

    1. There is no reason to explore multicolinearity testing. Multicolinearity is a "problem" to the extent that it inflates the standard errors of the variables that are involved in the multicolinearity. So, if all your standard errors are small enough for the purposes of your research, you may have multicolinearity, but you don't have a multicolinearity problem. Check your standard errors: if they are fine, you are done. Move on.

    2. If the standard errors that are inflated are only those of variables that have been included in the model to deal with omitted variable bias (aka confounding), then you need not concern yourself with it. The information provided by the variables has been taken into account in the regression and the bias has been adjusted for. It does not matter that you can't get precise estimates of the effects of those variables: they were just included to adjust for their confounding effects and are not of importance in their own right. Your purpose in including them has been accomplished (and would not be accomplished any better with a non-multicolinear set of variables carrying the same information). Move on.

    3. If the standard error of a focal predictor, that is, of a variable whose effect your goal is to estimate, is inflated, then you have a multicolinearity problem. Unfortunately, it is also a problem that has no solution within your data. No amount of transformation or mathematical magic will solve this problem. Various little tricks will enable you to get more precise estimates of some other effect, and maybe that is helpful, but usually not. You simply have been defeated. The only actual solution to this problem is to obtain more data or better data, or use a different study design in which the colinearity that entrapped your variable of interest is broken (e.g. a matched design)--which usually also entails getting a new data set.
    Last edited by Clyde Schechter; 20 Mar 2021, 15:45.

    Comment


    • #3
      Certainly agree with Clyde about multicollinearity. But if you use variables in nonlinear ways, such as squares and interactions, then centering can be important. That's because if you don't center then usually you're estimating parameters that have no interpretation, and the VIFs in that case are trying to tell you something.

      Comment


      • #4
        Dear Clyde,

        Thank you for taking the time to explain. I'm running a panel data so I'm not sure if I should mean-center each country separately or combined. Moreover, I have used my GDP variable in squared term so I think as Dr. Wooldridge said it can be important. The vifs on gdp and gdp2 are around 1500 but that for energy is 21.


        Comment


        • #5
          Sundus: As per my point, if you don't center gdp before squaring then the coefficient on gdp is interpreted as the effect starting from gdp = 0, which is not at all interesting. And, you shouldn't hope to estimate it.

          Comment


          • #6
            Dear Dr. Wooldridge,

            Thank you for your reply. My variables are in log terms, can I center them after logging? Also, I have several other control variables, do I center them all or just those with high VIFs?

            Comment


            • #7
              Yes, you can center the logs around their averages. And I would do so for any variable that appears in squares, interactions, and so on.

              Comment


              • #8
                Dear Dr. Wooldridge,

                Regarding the centered GDP coefficient in log terms. If I wanted to interpret it in constant 2010 $US, would reversing the natural logs and then adding the mean make sense to achieve that?

                Comment

                Working...
                X