Solving perfect multicollinearity by dropping constant (how does noconstant work?)

Howard Leathers

Join Date: Aug 2015

Posts: 2
#1

Solving perfect multicollinearity by dropping constant (how does noconstant work?)

06 Aug 2015, 09:25

I have explanatory variables that are proportions of the population in each age group, so the values add to 1 for each observation. When I regress y on those x's one of the x's is omitted, and a constant term is estimated. (As I expected.) I tried running the same regression using the noconstant option, and stata eliminated the constant, but still omitted one of the x's. Can someone help me figure out how to include all x's without the constant term? Is noconstant not the right command for this?
Tags: None
Jorge Eduardo Perez Perez

Join Date: Mar 2014

Posts: 429
#2

06 Aug 2015, 11:54

Maybe another one of your variables is collinear with the x variables you included. Please show us what you typed and Stata's output.

Jorge Eduardo Pérez Pérez
www.jorgeperezperez.com
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#3

06 Aug 2015, 14:14

Dear Howard,

I do not think it is a good idea to include all the variables and drop the constant. My problem with doing this is that the interpretation of the coefficients would be tricky because you cannot change one variable while keeping the others fixed (because they add up to one, if one increases, at least one of the others has to decrease). That is, there is no ceteris paribus interpretation in such model. Even if you include the constant and drop one of the variables, you need to keep in mind that what the coefficients give you are the effects of a change in one of the regressors when that change is offset by a change in the excluded category. Obviously, the results may change dramatically if you change the excluded category. In short, you need to be very careful when interpreting the results of models with variables like that.

All the best,

Joao
Comment
Howard Leathers

Join Date: Aug 2015

Posts: 2
#4

07 Aug 2015, 11:06

My "theory" such as it is: TFPit = ai + gZi + bjXjit for country j and time period t where Z's are unmeasured fixed effects. (TFP is total factor productivity, but could be anything, and the Xjs are proportion of population in each age group (for example four age groups: children, young workers, old workers, and retired). To restate, in case it wasn't clear, the
X's are the percentage of total population in each age group. So I wanted to estimate (consistent with the "theory") TFP(t+1) - TFP(t) = bjXj(t+1) - bjXj(t) to get estimates of bj's. The Xj's (by definition) sum to 1, therefore are perfectly collinear with the constant (vector of 1s). (ppc014 is population percentage age 0 to 14, etc., so these are my Xj's

so attempt 1 reg TFP ppc014 ppc1539 ppc4064 ppc65

yields:
note: ppc4064 omitted because of collinearity

Source | SS df MS Number of obs = 1,470
-------------+---------------------------------- F(3, 1466) = 10.48
Model | 53467.6526 3 17822.5509 Prob > F = 0.0000
Residual | 2494279.7 1,466 1701.41862 R-squared = 0.0210
-------------+---------------------------------- Adj R-squared = 0.0190
Total | 2547747.36 1,469 1734.34129 Root MSE = 41.248

------------------------------------------------------------------------------
TFP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ppc014 | -503.5971 99.80355 -5.05 0.000 -699.3701 -307.8241
ppc1539 | -467.504 102.6745 -4.55 0.000 -668.9085 -266.0995
ppc4064 | 0 (omitted)
ppc65 | -338.6578 277.1404 -1.22 0.222 -882.2918 204.9762
_cons | 7.128263 1.264061 5.64 0.000 4.648702 9.607825
------------------------------------------------------------------------------

attempt 2 reg TFP ppc014 ppc1539 ppc4064 ppc65 , nocon

yields

note: ppc4064 omitted because of collinearity

Source | SS df MS Number of obs = 1,470
-------------+---------------------------------- F(3, 1467) = 24.37
Model | 126993.145 3 42331.0484 Prob > F = 0.0000
Residual | 2548385.33 1,467 1737.14065 R-squared = 0.0475
-------------+---------------------------------- Adj R-squared = 0.0455
Total | 2675378.48 1,470 1819.98536 Root MSE = 41.679

------------------------------------------------------------------------------
TFP | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ppc014 | -620.1051 98.66135 -6.29 0.000 -813.6375 -426.5728
ppc1539 | -472.1517 103.7434 -4.55 0.000 -675.6529 -268.6506
ppc4064 | 0 (omitted)
ppc65 | 62.1795 270.6674 0.23 0.818 -468.7568 593.1158
------------------------------------------------------------------------------
Comment
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#5

07 Aug 2015, 11:25

It looks like Jorge is right - your problem is not just from the x's adding to one. Using only the sample actually usable in the regression, examine the data. Sum the four and see what you get. Look at the correlations among the four x's. Try regressing each x on the other three.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#6

07 Aug 2015, 12:55

It is possible that you did not correctly create ppc4064 and it has a constant value. As Phil suggests, if you haven't already done so, it's time to look at your data. Even something a simple as

Code:

codebook TFP ppc014 ppc1539 ppc4064 ppc65

could be illuminating.
Comment

Announcement

Solving perfect multicollinearity by dropping constant (how does noconstant work?)

Comment

Comment

Comment

Comment

Comment