I have a large panel dataset with individual and parental characteristics- I will be running the regressions on 4 samples 2 of which have over 40,000 observations, and 2 which have under 200 observations. There are missing variables given the data is taken from a household survey, but given typical attrition concerns I don't expect it to be the main cause?
I am running xtreg for dependent variables that are continuous and xtlogit for dummies (probability of inactivity). See below my code for the regressions. To avoid overfitting I am running one regression to observe the impact of the individuals characteristics on the outcome variable, and a separate one to see the relationship between parental characteristics (using equivalent variables) and the individuals outcome variable.
I keep getting 'omitted because of collinearity' for many of the most important independents 'i.hiqual_dv' highest qualification, 'occupation_group' occupation worked in etc. I dont see how there could be so much collinearity between these groups when, say for occupation I aggregated about 30 occupations down to 7 groups. For some of the regressions STATA runs 300 iterations giving the identical log-likelihood followed by (not concave), ending in 'convergence not achieved'.
How could I fix the problems outlined here?
data:image/s3,"s3://crabby-images/0a512/0a5122a22bfe3b14a3a921418aadecf55b13842e" alt="Click image for larger version
Name: Screenshot 2024-04-03 at 18.01.42.png
Views: 2
Size: 212.7 KB
ID: 1748760"
I am running xtreg for dependent variables that are continuous and xtlogit for dummies (probability of inactivity). See below my code for the regressions. To avoid overfitting I am running one regression to observe the impact of the individuals characteristics on the outcome variable, and a separate one to see the relationship between parental characteristics (using equivalent variables) and the individuals outcome variable.
I keep getting 'omitted because of collinearity' for many of the most important independents 'i.hiqual_dv' highest qualification, 'occupation_group' occupation worked in etc. I dont see how there could be so much collinearity between these groups when, say for occupation I aggregated about 30 occupations down to 7 groups. For some of the regressions STATA runs 300 iterations giving the identical log-likelihood followed by (not concave), ending in 'convergence not achieved'.
How could I fix the problems outlined here?
Comment