Dear stata experts,
I could use some help with the following problem. I am running a multivariate OLS regression with (standardized) test scores as the dependent variable, and a set of continuous and categorical variables as independent variables. For some of the factor variables, I added an extra category for 'missings'. This works fine for most categorical variables, however for the variable mum_age_deliv_cat (maternal age at delivery), this category is omitted in stata output automatically without specification of reason (multicollinearity etc).
Code for multivariate regression is the following:
The missing category for mum_age_deliv_cat isn't omitted until I include zdepression or mum_smokes to the regression.
For example:
shows missing category for mum_age_deliv_cat correctly.
I (manually) checked in data browser whether the missings for mum_age_deliv are the same observations as mum_smokes or zdepression, however this is not the case. Also see:
Finally, when I try to run the regression with the missing category set as the baselevel, this is the response I get:
I am at a loss as to why this happens, and it now states that there are no observations in the sample. Hope someone can help me!
PS: This is my first post, so I hope I formatted everything the right way. Apologies upfront if not!
Kind regards,
Wouter
I could use some help with the following problem. I am running a multivariate OLS regression with (standardized) test scores as the dependent variable, and a set of continuous and categorical variables as independent variables. For some of the factor variables, I added an extra category for 'missings'. This works fine for most categorical variables, however for the variable mum_age_deliv_cat (maternal age at delivery), this category is omitted in stata output automatically without specification of reason (multicollinearity etc).
Code for multivariate regression is the following:
Code:
regress zks4_GCSE_tot mum_smokes##c.zea1_pgs i.sex ib3.mum_age_deliv_cat zdepression ib3.mum_SES ib3.marital_st_mum ib3.mum_ed_add ib6.cig_change, robust allbaselevels Linear regression Number of obs = 5,627 F(28, 5598) = 156.00 Prob > F = 0.0000 R-squared = 0.1924 Root MSE = .85361 ------------------------------------------------------------------------------------------ | Robust zks4_GCSE_tot | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------------------+---------------------------------------------------------------- mum_smokes | doesn't smoke | 0 (base) smokes | -.1532702 .0622202 -2.46 0.014 -.2752459 -.0312945 | zea1_pgs | .0896178 .0126652 7.08 0.000 .0647892 .1144464 | mum_smokes#c.zea1_pgs | doesn't smoke | 0 (base) smokes | .055319 .0326839 1.69 0.091 -.0087542 .1193922 | sex | Male | 0 (base) Female | .2701045 .0228218 11.84 0.000 .2253649 .3148442 | mum_age_deliv_cat | <20 | -.1404631 .0941917 -1.49 0.136 -.3251154 .0441892 20-24 | -.110315 .036715 -3.00 0.003 -.1822907 -.0383393 25-29 | 0 (base) 30-34 | .0396163 .0277931 1.43 0.154 -.0148689 .0941014 35+ | .1217735 .0380242 3.20 0.001 .0472314 .1963156 | zdepression | -.0516424 .0123182 -4.19 0.000 -.0757908 -.027494 | mum_SES | I | .1243631 .0588214 2.11 0.035 .0090503 .2396759 II | .0022687 .0313728 0.07 0.942 -.059234 .0637715 III (non-manual labour) | 0 (base) III (manual labour) | -.1506617 .049965 -3.02 0.003 -.2486125 -.052711 IV | -.1566356 .0502407 -3.12 0.002 -.2551268 -.0581443 V | -.380365 .1006184 -3.78 0.000 -.5776161 -.1831139 Missing | -.2539358 .0404962 -6.27 0.000 -.3333241 -.1745475 | marital_st_mum | Never married | -.1206388 .0385989 -3.13 0.002 -.1963076 -.0449699 Separated | -.1357422 .0572148 -2.37 0.018 -.2479053 -.023579 Ever married | 0 (base) Missing | -.1130427 .1802733 -0.63 0.531 -.4664482 .2403628 | mum_ed_add | CSE / None | -.2884407 .0391122 -7.37 0.000 -.3651157 -.2117657 Vocational | -.1568851 .0447715 -3.50 0.000 -.2446547 -.0691156 O-levels | 0 (base) A-levels | .1809204 .0312505 5.79 0.000 .1196574 .2421835 Degree | .4228745 .0440691 9.60 0.000 .336482 .5092671 Missing | -.0050217 .0820701 -0.06 0.951 -.165911 .1558676 | cig_change | Went off it | -.1052319 .0450578 -2.34 0.020 -.1935626 -.0169012 Cut down | .0007196 .0611025 0.01 0.991 -.1190651 .1205042 Craved more | -.0448434 .2700721 -0.17 0.868 -.5742895 .4846027 Had more | -.4333814 .0764357 -5.67 0.000 -.5832251 -.2835377 NO Change | -.0952739 .0793533 -1.20 0.230 -.250837 .0602893 Never has this | 0 (base) | _cons | .1129289 .0281212 4.02 0.000 .0578005 .1680574 ------------------------------------------------------------------------------------------
For example:
Code:
regress zks4_GCSE_tot i.mum_age_deliv_cat sex i.marital_st_mum i.mum_ed_add i.mum_SES Source | SS df MS Number of obs = 11,904 -------------+---------------------------------- F(20, 11883) = 134.48 Model | 2197.07793 20 109.853896 Prob > F = 0.0000 Residual | 9707.18812 11,883 .81689709 R-squared = 0.1846 -------------+---------------------------------- Adj R-squared = 0.1832 Total | 11904.266 11,903 1.00010636 Root MSE = .90382 ------------------------------------------------------------------------------------------ zks4_GCSE_tot | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------------------+---------------------------------------------------------------- mum_age_deliv_cat | 20-24 | .0309369 .0453468 0.68 0.495 -.0579502 .119824 25-29 | .1836477 .0453645 4.05 0.000 .0947259 .2725695 30-34 | .246943 .04722 5.23 0.000 .1543841 .339502 35+ | .2855106 .052452 5.44 0.000 .182696 .3883251 Missing | .6171147 .064453 9.57 0.000 .4907763 .7434531 | sex | .2518333 .0165871 15.18 0.000 .2193198 .2843467 | marital_st_mum | Separated | -.0312996 .0435235 -0.72 0.472 -.1166129 .0540136 Ever married | .2037728 .0250902 8.12 0.000 .1545919 .2529536 Missing | -.0736996 .0423146 -1.74 0.082 -.1566431 .0092439 | mum_ed_add | Vocational | .1721054 .0345848 4.98 0.000 .1043134 .2398973 O-levels | .3589388 .0260158 13.80 0.000 .3079437 .409934 A-levels | .5817595 .030404 19.13 0.000 .5221626 .6413564 Degree | .9064845 .0398891 22.73 0.000 .8282955 .9846736 Missing | .2245926 .0366603 6.13 0.000 .1527325 .2964527 | mum_SES | II | -.1239549 .0526468 -2.35 0.019 -.2271513 -.0207585 III (non-manual labour) | -.0957971 .0546101 -1.75 0.079 -.2028419 .0112477 III (manual labour) | -.2680993 .0634493 -4.23 0.000 -.3924703 -.1437283 IV | -.316245 .0617112 -5.12 0.000 -.437209 -.1952809 V | -.420445 .0846138 -4.97 0.000 -.5863018 -.2545882 Missing | -.3964954 .0566912 -6.99 0.000 -.5076195 -.2853714 | _cons | -.8257235 .0739581 -11.16 0.000 -.9706935 -.6807534 ------------------------------------------------------------------------------------------
I (manually) checked in data browser whether the missings for mum_age_deliv are the same observations as mum_smokes or zdepression, however this is not the case. Also see:
Code:
tab mum_age_deliv_cat Age of | mother at | delivery, | grouped | Freq. Percent Cum. ------------+----------------------------------- <20 | 656 4.21 4.21 20-24 | 2,705 17.38 21.59 25-29 | 5,440 34.95 56.54 30-34 | 3,878 24.91 81.46 35+ | 1,397 8.98 90.43 Missing | 1,489 9.57 100.00 ------------+----------------------------------- Total | 15,565 100.00
Code:
tab mum_smokes if mum_age_deliv_cat==6 mother smokes | any amount of | cigs during | pregnancy | Freq. Percent Cum. --------------+----------------------------------- doesn't smoke | 312 78.00 78.00 smokes | 88 22.00 100.00 --------------+----------------------------------- Total | 400 100.00
Code:
tab mum_age_deliv_cat if missing(mum_smokes) Age of | mother at | delivery, | grouped | Freq. Percent Cum. ------------+----------------------------------- <20 | 177 6.20 6.20 20-24 | 510 17.85 24.05 25-29 | 572 20.02 44.07 30-34 | 350 12.25 56.32 35+ | 159 5.57 61.88 Missing | 1,089 38.12 100.00 ------------+----------------------------------- Total | 2,857 100.00
Finally, when I try to run the regression with the missing category set as the baselevel, this is the response I get:
Code:
. regress zks4_GCSE_tot mum_smokes ib6.mum_age_deliv_cat note: 5.mum_age_deliv_cat omitted because of collinearity note: 6b.mum_age_deliv_cat identifies no observations in the sample Source | SS df MS Number of obs = 9,936 -------------+---------------------------------- F(5, 9930) = 161.76 Model | 729.671972 5 145.934394 Prob > F = 0.0000 Residual | 8958.40157 9,930 .902155244 R-squared = 0.0753 -------------+---------------------------------- Adj R-squared = 0.0749 Total | 9688.07354 9,935 .975145802 Root MSE = .94982 ----------------------------------------------------------------------------------- zks4_GCSE_tot | Coef. Std. Err. t P>|t| [95% Conf. Interval] ------------------+---------------------------------------------------------------- mum_smokes | -.4360266 .0240728 -18.11 0.000 -.4832141 -.3888391 | mum_age_deliv_cat | <20 | -.6483968 .0581204 -11.16 0.000 -.7623246 -.534469 20-24 | -.466057 .0384945 -12.11 0.000 -.5415141 -.3905999 25-29 | -.2004997 .0344241 -5.82 0.000 -.267978 -.1330215 30-34 | -.055606 .0358554 -1.55 0.121 -.1258899 .0146779 35+ | 0 (omitted) Missing | 0 (empty) | _cons | .3428121 .0312033 10.99 0.000 .2816473 .4039768 -----------------------------------------------------------------------------------
PS: This is my first post, so I hope I formatted everything the right way. Apologies upfront if not!
Kind regards,
Wouter
Comment