Hi guys,
I'm running an areg regression:
where all the dependent variables are categoricals. Doing this I have results. However when I add "language" as another dependent variable, I have the "omitted because of collinearity" problem. Note that "language" is the official language of the country so even if there's no variation within the country, the same language can be spoken in several countries (when there's more than one official language in a county, I took the one used by the majority, i.e. Belgium -> Dutch, Canada-> English) .
I wonder first whether this is a problem related to my dataset, which is quiet big (24 millions of observations) but not enough to deal with 100 possible values of countries, more than 40 different languages and 794,914 possible values for vd3.
Secondly, even if all the possible values of language are omitted because of collinearity, I do observe a change (small) in the coefficients of the other variables. I.e the coefficient of Uruguay is 0.164 without language in my model and it changes to 0167 when languages is added (but omitted dure collinearity).
Best!
Jean
I'm running an areg regression:
Code:
areg y i.country i.gender, absorb(vd3)
I wonder first whether this is a problem related to my dataset, which is quiet big (24 millions of observations) but not enough to deal with 100 possible values of countries, more than 40 different languages and 794,914 possible values for vd3.
Secondly, even if all the possible values of language are omitted because of collinearity, I do observe a change (small) in the coefficients of the other variables. I.e the coefficient of Uruguay is 0.164 without language in my model and it changes to 0167 when languages is added (but omitted dure collinearity).
Best!
Jean
Comment