Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Solving perfect multicollinearity by dropping constant (how does noconstant work?)

    I have explanatory variables that are proportions of the population in each age group, so the values add to 1 for each observation. When I regress y on those x's one of the x's is omitted, and a constant term is estimated. (As I expected.) I tried running the same regression using the noconstant option, and stata eliminated the constant, but still omitted one of the x's. Can someone help me figure out how to include all x's without the constant term? Is noconstant not the right command for this?

  • #2
    Maybe another one of your variables is collinear with the x variables you included. Please show us what you typed and Stata's output.
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      Dear Howard,

      I do not think it is a good idea to include all the variables and drop the constant. My problem with doing this is that the interpretation of the coefficients would be tricky because you cannot change one variable while keeping the others fixed (because they add up to one, if one increases, at least one of the others has to decrease). That is, there is no ceteris paribus interpretation in such model. Even if you include the constant and drop one of the variables, you need to keep in mind that what the coefficients give you are the effects of a change in one of the regressors when that change is offset by a change in the excluded category. Obviously, the results may change dramatically if you change the excluded category. In short, you need to be very careful when interpreting the results of models with variables like that.

      All the best,

      Joao

      Comment


      • #4
        My "theory" such as it is: TFPit = ai + gZi + bjXjit for country j and time period t where Z's are unmeasured fixed effects. (TFP is total factor productivity, but could be anything, and the Xjs are proportion of population in each age group (for example four age groups: children, young workers, old workers, and retired). To restate, in case it wasn't clear, the
        X's are the percentage of total population in each age group. So I wanted to estimate (consistent with the "theory") TFP(t+1) - TFP(t) = bjXj(t+1) - bjXj(t) to get estimates of bj's. The Xj's (by definition) sum to 1, therefore are perfectly collinear with the constant (vector of 1s). (ppc014 is population percentage age 0 to 14, etc., so these are my Xj's

        so attempt 1 reg TFP ppc014 ppc1539 ppc4064 ppc65

        yields:
        note: ppc4064 omitted because of collinearity

        Source | SS df MS Number of obs = 1,470
        -------------+---------------------------------- F(3, 1466) = 10.48
        Model | 53467.6526 3 17822.5509 Prob > F = 0.0000
        Residual | 2494279.7 1,466 1701.41862 R-squared = 0.0210
        -------------+---------------------------------- Adj R-squared = 0.0190
        Total | 2547747.36 1,469 1734.34129 Root MSE = 41.248

        ------------------------------------------------------------------------------
        TFP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        ppc014 | -503.5971 99.80355 -5.05 0.000 -699.3701 -307.8241
        ppc1539 | -467.504 102.6745 -4.55 0.000 -668.9085 -266.0995
        ppc4064 | 0 (omitted)
        ppc65 | -338.6578 277.1404 -1.22 0.222 -882.2918 204.9762
        _cons | 7.128263 1.264061 5.64 0.000 4.648702 9.607825
        ------------------------------------------------------------------------------


        attempt 2 reg TFP ppc014 ppc1539 ppc4064 ppc65 , nocon

        yields

        note: ppc4064 omitted because of collinearity

        Source | SS df MS Number of obs = 1,470
        -------------+---------------------------------- F(3, 1467) = 24.37
        Model | 126993.145 3 42331.0484 Prob > F = 0.0000
        Residual | 2548385.33 1,467 1737.14065 R-squared = 0.0475
        -------------+---------------------------------- Adj R-squared = 0.0455
        Total | 2675378.48 1,470 1819.98536 Root MSE = 41.679

        ------------------------------------------------------------------------------
        TFP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        -------------+----------------------------------------------------------------
        ppc014 | -620.1051 98.66135 -6.29 0.000 -813.6375 -426.5728
        ppc1539 | -472.1517 103.7434 -4.55 0.000 -675.6529 -268.6506
        ppc4064 | 0 (omitted)
        ppc65 | 62.1795 270.6674 0.23 0.818 -468.7568 593.1158
        ------------------------------------------------------------------------------

        Comment


        • #5
          It looks like Jorge is right - your problem is not just from the x's adding to one. Using only the sample actually usable in the regression, examine the data. Sum the four and see what you get. Look at the correlations among the four x's. Try regressing each x on the other three.

          Comment


          • #6
            It is possible that you did not correctly create ppc4064 and it has a constant value. As Phil suggests, if you haven't already done so, it's time to look at your data. Even something a simple as
            Code:
            codebook TFP ppc014 ppc1539 ppc4064 ppc65
            could be illuminating.

            Comment

            Working...
            X