Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to retrieve coefficients of all categorical variables?

    I am aware that in order to avoid perfect multicollinearity one of the dummies is omitted while estimation. However, I need the coefficients of all categories of all categorical variables for some purpose. Even when I used the "noconstant" option this issue is not solved. Just as a trial:
    probit migrate geneduc mpce_class sex_dummy marital_dummy reld1 reld2 reld3 sgroupd1 sgroupd2 sg
    > roupd3, noconstant

    note: sgroupd3 omitted because of collinearity.
    Iteration 0: log likelihood = -4807.6688
    Iteration 1: log likelihood = -2568.8321
    Iteration 2: log likelihood = -2534.1998
    Iteration 3: log likelihood = -2534.1416
    Iteration 4: log likelihood = -2534.1416

    Probit regression Number of obs = 6,936
    Wald chi2(9) = 3012.42
    Log likelihood = -2534.1416 Prob > chi2 = 0.0000

    -------------------------------------------------------------------------------
    migrate | Coefficient Std. err. z P>|z| [95% conf. interval]
    --------------+----------------------------------------------------------------
    geneduc | -.0024534 .0126715 -0.19 0.846 -.0272892 .0223823
    mpce_class | .2612434 .0170237 15.35 0.000 .2278776 .2946092
    sex_dummy | .7543771 .0415032 18.18 0.000 .6730323 .8357219
    marital_dummy | .4893651 .0537443 9.11 0.000 .3840282 .5947019
    reld1 | -2.799126 .1170869 -23.91 0.000 -3.028612 -2.56964
    reld2 | -2.972429 .1245431 -23.87 0.000 -3.216528 -2.728329
    reld3 | -3.131233 .2448611 -12.79 0.000 -3.611152 -2.651314
    sgroupd1 | .4444475 .0911522 4.88 0.000 .2657924 .6231027
    sgroupd2 | .3533121 .0964313 3.66 0.000 .1643103 .5423139
    sgroupd3 | 0 (omitted)
    -------------------------------------------

    What I mean to say is that I need sgroupd3 coefficient also. Can someone please help me with this?

  • #2
    Saakshi:
    welcome to this forum.
    If the -nocons- option does not help, the omitted level of your categorical variable is collinear with the level of some other predictor included in the right-hand side of your regression equation.
    I'd double check your model specification.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Dear Sir, thank you so much. Here sgroupd1, sgroupd2 and sgroupd3 are three social group categories. Just like reld1, reld2 and reld3 are three religion categories. I understand that sgroudp3 may be omitted because of perfect collinearity. however, just like reld3 coefficient could be obtained by using the noconstant option. similarly is there any way to get the coefficient of sgroupd3 also? my end goal is to get coefficients of ALL variables and if there are categorical variables then i want coefficients of all categories/dummies despite existence of multicollinearity. is there any way to do that?

      Comment


      • #4
        Instead of using the indicator coded variables reld1, reld2, reld3 and sdgroup1, sdgroup2, sdgroup3 you should use factor variable notation (see help factor variables) followed by the post estimation commands testparm and margins (where you also can account for multiple comparisions with the options pwcompare and mcompare).

        Comment


        • #5
          Thank you sir. But even when I use factor notation, that is i.religion or i.sgroup, I cannot fetch coefficients of all categories. Base category is being dropped.

          Comment


          • #6
            Saakshi:
            this is expected, as the omission of the reference category protects your analysis from the so-called dummy trap (Dummy variable (statistics) - Wikipedia).
            Kind regards,
            Carlo
            (StataNow 18.5)

            Comment


            • #7
              Carlo: yes sir, I have read on the Dummy variable trap and aware of the reason behind omission of reference categories. However, I wanted to know if there is any way (or if it is even possible) which can help me fetch coefficients of all categorical variables including the reference categories.

              Comment


              • #8
                Saakshi:
                unfortunately, no.
                If the -nocons- option does not help, it nay well be that the omitted level of your categorical variable is collinear with the level of some other predictor included in the right-hand side of your regression equation.
                If that is the case, the only way is to go for a different specification of your regression model.
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  This question has been answered, but others looking at the title of the thread may interpret the question literally. If one uses factor variable notation as recommended, the coefficients of factor variables have periods. Assuming that one does not use time-series operators, then one can use this as a way to identify categorical variables. Here is how I would go about obtaining their coefficients.


                  Code:
                  sysuse auto, clear
                  regress mpg price i.foreign weight length i.rep78 disp, robust
                  local catvars = ustrregexra(" " + "`:colnames(e(b))'"+ " ", "(\s)([^.]*)(\s)", "$1")
                  di "`catvars'"
                  local matelements = subinstr(trim(ustrregexra(" " + "`:colnames(e(b))'"+ " ", "(\s)([^.]*)(\s)", "$1")), " ", `""], e(b)[1, ""', .)
                  mat coefficients= e(b)[1,"`matelements'"]
                  mat list coefficients
                  Res.:

                  Code:
                   di "`catvars'"
                   0b.foreign 1.foreign 1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78
                  
                  .
                  . mat list coefficients
                  
                  coefficients[1,7]
                              0b.          1.         1b.          2.          3.          4.          5.
                         foreign     foreign       rep78       rep78       rep78       rep78       rep78
                  y1           0  -2.7275812           0   .19634757   -.0046563   1.0848491   4.4409442
                  Last edited by Andrew Musau; 29 Feb 2024, 06:27.

                  Comment


                  • #10
                    And following the example in #9, testparm and margins allow to test the factors "i.rep78" and "i.foreign" and test all contrasts of your factor variables:
                    Code:
                    . sysuse auto, clear
                    (1978 automobile data)
                    
                    .
                    . regress mpg price i.foreign weight length i.rep78 displacement, robust
                    
                    Linear regression                               Number of obs     =         69
                                                                    F(9, 59)          =      24.99
                                                                    Prob > F          =     0.0000
                                                                    R-squared         =     0.7182
                                                                    Root MSE          =     3.3432
                    
                    ------------------------------------------------------------------------------
                                 |               Robust
                             mpg | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                           price |   -.000103   .0002439    -0.42   0.674     -.000591    .0003851
                                 |
                         foreign |
                        Foreign  |  -2.727581   1.370427    -1.99   0.051      -5.4698    .0146375
                          weight |  -.0024602   .0031441    -0.78   0.437    -.0087516    .0038312
                          length |  -.1324748   .0839849    -1.58   0.120    -.3005281    .0355785
                                 |
                           rep78 |
                              2  |   .1963476   1.037511     0.19   0.851    -1.879708    2.272403
                              3  |  -.0046563   .8817882    -0.01   0.996     -1.76911    1.759798
                              4  |   1.084849   1.137299     0.95   0.344    -1.190881    3.360579
                              5  |   4.440944   2.036306     2.18   0.033     .3663046    8.515584
                                 |
                    displacement |   .0014018   .0107067     0.13   0.896    -.0200222    .0228257
                           _cons |   53.86674   8.171879     6.59   0.000     37.51485    70.21863
                    ------------------------------------------------------------------------------
                    
                    .
                    . testparm i.foreign
                    
                     ( 1)  1.foreign = 0
                    
                           F(  1,    59) =    3.96
                                Prob > F =    0.0512
                    
                    . margins i.foreign, pwcompare(effects) mcompare(sidak)
                    
                    Pairwise comparisons of predictive margins                  Number of obs = 69
                    Model VCE: Robust
                    
                    Expression: Linear prediction, predict()
                    
                    note: option sidak ignored since there is only one comparison
                    --------------------------------------------------------------------------------------
                                         |            Delta-method    Unadjusted           Unadjusted
                                         |   Contrast   std. err.      t    P>|t|     [95% conf. interval]
                    ---------------------+----------------------------------------------------------------
                                 foreign |
                    Foreign vs Domestic  |  -2.727581   1.370427    -1.99   0.051      -5.4698    .0146375
                    --------------------------------------------------------------------------------------
                    
                    .
                    . testparm i.rep78
                    
                     ( 1)  2.rep78 = 0
                     ( 2)  3.rep78 = 0
                     ( 3)  4.rep78 = 0
                     ( 4)  5.rep78 = 0
                    
                           F(  4,    59) =    1.65
                                Prob > F =    0.1731
                    
                    . margins i.rep78, pwcompare(effects) mcompare(sidak)
                    
                    Pairwise comparisons of predictive margins                  Number of obs = 69
                    Model VCE: Robust
                    
                    Expression: Linear prediction, predict()
                    
                    ---------------------------
                                 |    Number of
                                 |  comparisons
                    -------------+-------------
                           rep78 |           10
                    ---------------------------
                    
                    ------------------------------------------------------------------------------
                                 |            Delta-method      Sidak                Sidak
                                 |   Contrast   std. err.      t    P>|t|     [95% conf. interval]
                    -------------+----------------------------------------------------------------
                           rep78 |
                         2 vs 1  |   .1963476   1.037511     0.19   1.000    -2.820959    3.213655
                         3 vs 1  |  -.0046563   .8817882    -0.01   1.000    -2.569087    2.559774
                         4 vs 1  |   1.084849   1.137299     0.95   0.985    -2.222661    4.392359
                         5 vs 1  |   4.440944   2.036306     2.18   0.286    -1.481074    10.36296
                         3 vs 2  |  -.2010039   .7003784    -0.29   1.000    -2.237855    1.835848
                         4 vs 2  |   .8885015   1.031166     0.86   0.993     -2.11035    3.887353
                         5 vs 2  |   4.244597   1.942419     2.19   0.284    -1.404378    9.893571
                         4 vs 3  |   1.089505   .8382679     1.30   0.891    -1.348358    3.527369
                         5 vs 3  |   4.445601   1.925681     2.31   0.220    -1.154694     10.0459
                         5 vs 4  |   3.356095   2.019108     1.66   0.658    -2.515905    9.228096
                    ------------------------------------------------------------------------------
                    Last edited by Dirk Enzmann; 29 Feb 2024, 07:11. Reason: Added (effects) to pwcompare

                    Comment


                    • #11
                      Just seeing this thread. Andrew Musau 's deployment of -ustrregexra- is elegant indeed. But I find the regular expression syntax challenging to remember or to recreate. (See Stata's help for "regular expression".) For the general problem of extracting a submatrix using wildcards, I have posted the program -mluwild-. Using -mluwild-, one can replicate Andrew's list of factor variable coefficient names as follows:

                      Code:
                      sysuse auto, clear
                      regress mpg price i.foreign weight length i.rep78 disp, robust
                      mluwild e(b)["*","*.*"] , verbose
                      local fvnames : colnames r(submat)
                      di "`fvnames'"
                      The output looks like this:

                      Code:
                      . sysuse auto, clear
                      (1978 automobile data)
                      
                      . regress mpg price i.foreign weight length i.rep78 disp, robust
                      
                      <snip>
                      
                      . mluwild e(b)["*","*.*"] , verbose
                      
                                   |        0b.         1.        1b.         2.         3.         4.         5.
                                   |   foreign    foreign      rep78      rep78      rep78      rep78      rep78 
                      -------------+-----------------------------------------------------------------------------
                                y1 |         0  -2.727581          0   .1963476  -.0046563   1.084849   4.440944 
                      
                      . local fvnames : colnames r(submat)
                      
                      . di "`fvnames'"
                      0b.foreign 1.foreign 1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78
                      One can find, describe and choose to install mluwild here:

                      Code:
                      net describe mluwild, from("http://digital.cgdev.org/doc/stata/MO/Misc")
                      The help file for -mluwild- describes other use cases. Those using a Stata version earlier than 16 must also install -mlu-.

                      Comment

                      Working...
                      X