Panel data- variable omission due to COLLINEARITY/ NO WITHIN GROUP VARIANCE

John Lucas

Join Date: Apr 2024

Posts: 5
#1

Panel data- variable omission due to COLLINEARITY/ NO WITHIN GROUP VARIANCE

03 Apr 2024, 10:09

I have a large panel dataset with individual and parental characteristics- I will be running the regressions on 4 samples 2 of which have over 40,000 observations, and 2 which have under 200 observations. There are missing variables given the data is taken from a household survey, but given typical attrition concerns I don't expect it to be the main cause?

I am running xtreg for dependent variables that are continuous and xtlogit for dummies (probability of inactivity). See below my code for the regressions. To avoid overfitting I am running one regression to observe the impact of the individuals characteristics on the outcome variable, and a separate one to see the relationship between parental characteristics (using equivalent variables) and the individuals outcome variable.

I keep getting 'omitted because of collinearity' for many of the most important independents 'i.hiqual_dv' highest qualification, 'occupation_group' occupation worked in etc. I dont see how there could be so much collinearity between these groups when, say for occupation I aggregated about 30 occupations down to 7 groups. For some of the regressions STATA runs 300 iterations giving the identical log-likelihood followed by (not concave), ending in 'convergence not achieved'.

How could I fix the problems outlined here?

Attached Files
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29796
#2

03 Apr 2024, 10:21

Please read the Forum FAQ, with special attention to #12 for helpful advice on the best ways to show information here. In particular, screenshots are not as helpful as you might imagine. Of the three you posted here, only the last one is readable on my setup. Other people may not be able to read any of them. The lucky few can read them all. Moreover, even that last readable screenshot shows only part of the output, so that important information that may bear on your problem is not shown.

All of that said, on the guess that you are running fixed-effects regressions (you say -xtreg- without further specification, and the third screenshot doesn't include the part of the output that would show that), it is likely that the colinearity you are encountering is colinearity with the fixed effects themselves. Variables such as hiqual_dv are very likely not to change over time within the individual (at least if the surveyed population is already done with schooling). Similarly, occupational group is likely to be unchanging, at least over short or moderate periods of time.

One of the properties of fixed-effects models is that variables that do not change within panel (household in your case?) are colinear with the panel fixed effects themselves, and consequently they cannot be included in the regression model, nor is there any way to estimate their effects. If you are including these variables simply to adjust ("control") for their effects on your outcome variable, there is no need to do that because one of the pleasant features of the fixed effects model is that it automatically adjusts for the effects of all measured and unmeasured time-invariant attributes of the panels.
2 likes
Comment

Announcement

Panel data- variable omission due to COLLINEARITY/ NO WITHIN GROUP VARIANCE

Comment