How to retrieve coefficients of all categorical variables?

Saakshi Duseja

Join Date: Feb 2024

Posts: 4
#1

How to retrieve coefficients of all categorical variables?

27 Feb 2024, 01:32

I am aware that in order to avoid perfect multicollinearity one of the dummies is omitted while estimation. However, I need the coefficients of all categories of all categorical variables for some purpose. Even when I used the "noconstant" option this issue is not solved. Just as a trial:
probit migrate geneduc mpce_class sex_dummy marital_dummy reld1 reld2 reld3 sgroupd1 sgroupd2 sg
> roupd3, noconstant

note: sgroupd3 omitted because of collinearity.
Iteration 0: log likelihood = -4807.6688
Iteration 1: log likelihood = -2568.8321
Iteration 2: log likelihood = -2534.1998
Iteration 3: log likelihood = -2534.1416
Iteration 4: log likelihood = -2534.1416

Probit regression Number of obs = 6,936
Wald chi2(9) = 3012.42
Log likelihood = -2534.1416 Prob > chi2 = 0.0000

-------------------------------------------------------------------------------
migrate | Coefficient Std. err. z P>|z| [95% conf. interval]
--------------+----------------------------------------------------------------
geneduc | -.0024534 .0126715 -0.19 0.846 -.0272892 .0223823
mpce_class | .2612434 .0170237 15.35 0.000 .2278776 .2946092
sex_dummy | .7543771 .0415032 18.18 0.000 .6730323 .8357219
marital_dummy | .4893651 .0537443 9.11 0.000 .3840282 .5947019
reld1 | -2.799126 .1170869 -23.91 0.000 -3.028612 -2.56964
reld2 | -2.972429 .1245431 -23.87 0.000 -3.216528 -2.728329
reld3 | -3.131233 .2448611 -12.79 0.000 -3.611152 -2.651314
sgroupd1 | .4444475 .0911522 4.88 0.000 .2657924 .6231027
sgroupd2 | .3533121 .0964313 3.66 0.000 .1643103 .5423139
sgroupd3 | 0 (omitted)
-------------------------------------------

What I mean to say is that I need sgroupd3 coefficient also. Can someone please help me with this?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#2

27 Feb 2024, 02:41

Saakshi:
welcome to this forum.
If the -nocons- option does not help, the omitted level of your categorical variable is collinear with the level of some other predictor included in the right-hand side of your regression equation.
I'd double check your model specification.

Kind regards,
Carlo
(StataNow 18.5)
Comment
Saakshi Duseja

Join Date: Feb 2024

Posts: 4
#3

27 Feb 2024, 02:51

Dear Sir, thank you so much. Here sgroupd1, sgroupd2 and sgroupd3 are three social group categories. Just like reld1, reld2 and reld3 are three religion categories. I understand that sgroudp3 may be omitted because of perfect collinearity. however, just like reld3 coefficient could be obtained by using the noconstant option. similarly is there any way to get the coefficient of sgroupd3 also? my end goal is to get coefficients of ALL variables and if there are categorical variables then i want coefficients of all categories/dummies despite existence of multicollinearity. is there any way to do that?
Comment
Dirk Enzmann

Join Date: Apr 2014

Posts: 516
#4

27 Feb 2024, 03:26

Instead of using the indicator coded variables reld1, reld2, reld3 and sdgroup1, sdgroup2, sdgroup3 you should use factor variable notation (see help factor variables) followed by the post estimation commands testparm and margins (where you also can account for multiple comparisions with the options pwcompare and mcompare).
1 like
Comment
Saakshi Duseja

Join Date: Feb 2024

Posts: 4
#5

29 Feb 2024, 00:37

Thank you sir. But even when I use factor notation, that is i.religion or i.sgroup, I cannot fetch coefficients of all categories. Base category is being dropped.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#6

29 Feb 2024, 02:23

Saakshi:
this is expected, as the omission of the reference category protects your analysis from the so-called dummy trap (Dummy variable (statistics) - Wikipedia).

Kind regards,
Carlo
(StataNow 18.5)
Comment
Saakshi Duseja

Join Date: Feb 2024

Posts: 4
#7

29 Feb 2024, 02:55

Carlo: yes sir, I have read on the Dummy variable trap and aware of the reason behind omission of reference categories. However, I wanted to know if there is any way (or if it is even possible) which can help me fetch coefficients of all categorical variables including the reference categories.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17601
#8

29 Feb 2024, 04:43

Saakshi:
unfortunately, no.
If the -nocons- option does not help, it nay well be that the omitted level of your categorical variable is collinear with the level of some other predictor included in the right-hand side of your regression equation.
If that is the case, the only way is to go for a different specification of your regression model.

Kind regards,
Carlo
(StataNow 18.5)
1 like
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 9945

29 Feb 2024, 06:22

This question has been answered, but others looking at the title of the thread may interpret the question literally. If one uses factor variable notation as recommended, the coefficients of factor variables have periods. Assuming that one does not use time-series operators, then one can use this as a way to identify categorical variables. Here is how I would go about obtaining their coefficients.

Code:

sysuse auto, clear
regress mpg price i.foreign weight length i.rep78 disp, robust
local catvars = ustrregexra(" " + "`:colnames(e(b))'"+ " ", "(\s)([^.]*)(\s)", "$1")
di "`catvars'"
local matelements = subinstr(trim(ustrregexra(" " + "`:colnames(e(b))'"+ " ", "(\s)([^.]*)(\s)", "$1")), " ", `""], e(b)[1, ""', .)
mat coefficients= e(b)[1,"`matelements'"]
mat list coefficients

Res.:

Code:

 di "`catvars'"
 0b.foreign 1.foreign 1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78

.
. mat list coefficients

coefficients[1,7]
            0b.          1.         1b.          2.          3.          4.          5.
       foreign     foreign       rep78       rep78       rep78       rep78       rep78
y1           0  -2.7275812           0   .19634757   -.0046563   1.0848491   4.4409442

Last edited by Andrew Musau; 29 Feb 2024, 06:27.

Comment

Dirk Enzmann

Join Date: Apr 2014
Posts: 516

#10

29 Feb 2024, 07:07

And following the example in #9, testparm and margins allow to test the factors "i.rep78" and "i.foreign" and test all contrasts of your factor variables:

Code:

. sysuse auto, clear
(1978 automobile data)

.
. regress mpg price i.foreign weight length i.rep78 displacement, robust

Linear regression                               Number of obs     =         69
                                                F(9, 59)          =      24.99
                                                Prob > F          =     0.0000
                                                R-squared         =     0.7182
                                                Root MSE          =     3.3432

------------------------------------------------------------------------------
             |               Robust
         mpg | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       price |   -.000103   .0002439    -0.42   0.674     -.000591    .0003851
             |
     foreign |
    Foreign  |  -2.727581   1.370427    -1.99   0.051      -5.4698    .0146375
      weight |  -.0024602   .0031441    -0.78   0.437    -.0087516    .0038312
      length |  -.1324748   .0839849    -1.58   0.120    -.3005281    .0355785
             |
       rep78 |
          2  |   .1963476   1.037511     0.19   0.851    -1.879708    2.272403
          3  |  -.0046563   .8817882    -0.01   0.996     -1.76911    1.759798
          4  |   1.084849   1.137299     0.95   0.344    -1.190881    3.360579
          5  |   4.440944   2.036306     2.18   0.033     .3663046    8.515584
             |
displacement |   .0014018   .0107067     0.13   0.896    -.0200222    .0228257
       _cons |   53.86674   8.171879     6.59   0.000     37.51485    70.21863
------------------------------------------------------------------------------

.
. testparm i.foreign

 ( 1)  1.foreign = 0

       F(  1,    59) =    3.96
            Prob > F =    0.0512

. margins i.foreign, pwcompare(effects) mcompare(sidak)

Pairwise comparisons of predictive margins                  Number of obs = 69
Model VCE: Robust

Expression: Linear prediction, predict()

note: option sidak ignored since there is only one comparison
--------------------------------------------------------------------------------------
                     |            Delta-method    Unadjusted           Unadjusted
                     |   Contrast   std. err.      t    P>|t|     [95% conf. interval]
---------------------+----------------------------------------------------------------
             foreign |
Foreign vs Domestic  |  -2.727581   1.370427    -1.99   0.051      -5.4698    .0146375
--------------------------------------------------------------------------------------

.
. testparm i.rep78

 ( 1)  2.rep78 = 0
 ( 2)  3.rep78 = 0
 ( 3)  4.rep78 = 0
 ( 4)  5.rep78 = 0

       F(  4,    59) =    1.65
            Prob > F =    0.1731

. margins i.rep78, pwcompare(effects) mcompare(sidak)

Pairwise comparisons of predictive margins                  Number of obs = 69
Model VCE: Robust

Expression: Linear prediction, predict()

---------------------------
             |    Number of
             |  comparisons
-------------+-------------
       rep78 |           10
---------------------------

------------------------------------------------------------------------------
             |            Delta-method      Sidak                Sidak
             |   Contrast   std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |
     2 vs 1  |   .1963476   1.037511     0.19   1.000    -2.820959    3.213655
     3 vs 1  |  -.0046563   .8817882    -0.01   1.000    -2.569087    2.559774
     4 vs 1  |   1.084849   1.137299     0.95   0.985    -2.222661    4.392359
     5 vs 1  |   4.440944   2.036306     2.18   0.286    -1.481074    10.36296
     3 vs 2  |  -.2010039   .7003784    -0.29   1.000    -2.237855    1.835848
     4 vs 2  |   .8885015   1.031166     0.86   0.993     -2.11035    3.887353
     5 vs 2  |   4.244597   1.942419     2.19   0.284    -1.404378    9.893571
     4 vs 3  |   1.089505   .8382679     1.30   0.891    -1.348358    3.527369
     5 vs 3  |   4.445601   1.925681     2.31   0.220    -1.154694     10.0459
     5 vs 4  |   3.356095   2.019108     1.66   0.658    -2.515905    9.228096
------------------------------------------------------------------------------

Last edited by Dirk Enzmann; 29 Feb 2024, 07:11. Reason: Added (effects) to pwcompare

Comment

Mead Over

Join Date: Sep 2014
Posts: 110

#11

23 Mar 2024, 12:22

Just seeing this thread. Andrew Musau 's deployment of -ustrregexra- is elegant indeed. But I find the regular expression syntax challenging to remember or to recreate. (See Stata's help for "regular expression".) For the general problem of extracting a submatrix using wildcards, I have posted the program -mluwild-. Using -mluwild-, one can replicate Andrew's list of factor variable coefficient names as follows:

Code:

sysuse auto, clear
regress mpg price i.foreign weight length i.rep78 disp, robust
mluwild e(b)["*","*.*"] , verbose
local fvnames : colnames r(submat)
di "`fvnames'"

The output looks like this:

Code:

. sysuse auto, clear
(1978 automobile data)

. regress mpg price i.foreign weight length i.rep78 disp, robust

<snip>

. mluwild e(b)["*","*.*"] , verbose

             |        0b.         1.        1b.         2.         3.         4.         5.
             |   foreign    foreign      rep78      rep78      rep78      rep78      rep78 
-------------+-----------------------------------------------------------------------------
          y1 |         0  -2.727581          0   .1963476  -.0046563   1.084849   4.440944 

. local fvnames : colnames r(submat)

. di "`fvnames'"
0b.foreign 1.foreign 1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78

One can find, describe and choose to install mluwild here:

Code:

net describe mluwild, from("http://digital.cgdev.org/doc/stata/MO/Misc")

The help file for -mluwild- describes other use cases. Those using a Stata version earlier than 16 must also install -mlu-.

Announcement