Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Linear regression output not showing certain categories

    Hello all,
    I have a dependent variable called bp (blood pressure)

    I have a continuous variable which runs from 1-100 called "hdl"
    I rescaled that into three categories, low, medium, and high, called "hdl_tertitle"

    I ran a linear regression (regress bp hdl) - this led to a normal stata output
    Then I ran a regression (regress bp i.hdl_tertile) - but the output only shows the value for the high category

    Am I missing something?

    Thanks!
    Al


  • #2
    You think you have three tertiles of hdl. But Stata seems to think otherwise. Observations with bp or hdl missing don't count, as they are not part of the estimation sample. So run:

    Code:
    tab hdl_tertile if !missing(hdl, regress)
    I'm guessing that you'll find that there are actually only two "tertiles." This may arise because you used incorrect code to create the hdl_tertile variable. Or it may be that the distribution of hdl is such that it is not possible to split it into three tertiles due to tied values.

    Comment


    • #3
      Hi Mr. Schechter,

      Thanks for the advice!
      Here is the same problem I'm having though with another (and in fact, all of my regressions!)

      For this following output, my DV is called lopcc (which is continuous) and my IV is called ldl3_quart (which is a categorical variable of 4 quartiles, which I created from the previously continuous variable called ldl3).

      This is the output I'm getting. Why would I not be able to see the Beta and P values for the first row (ie, where is my first quartile?). I know in logistic regression, you have to have a reference category, but I didnt think this was the case for linear?

      Thanks for any and all help!

      regress iopcc_out i.ldl3_quart

      iopcc_out | Coef. Std. Err. t P>|t| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      ldl3_quart |
      2 | .1445485 .1113414 1.30 0.194 -.0737154 .3628124
      3 | .2811891 .1141244 2.46 0.014 .0574697 .5049086
      4 | .5975637 .1285405 4.65 0.000 .3455844 .849543
      |
      _cons | 16.60205 .0775615 214.05 0.000 16.45 16.75409




      Comment


      • #4
        Alan:
        Stata omits the firrst level of your categorical variable to avoid the so called dummy trap (https://en.wikipedia.org/wiki/Dummy_...le_(statistics)).
        See -help fvvarlist- about how to set the reference category.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          The situation you describe in #1 is different from the one you describe in #3. What's going on in #3 is normal, as Carlo has pointed out in #4. It is the normal omission of the reference category which occurs in all regressions, not just logistic.

          But in #1 you say you have a 3 level variable and you are only getting out put for 1 category. So for some reason, in addition to omitting the reference category, Stata is also dropping another category of the variable. That means that there is something peculiar in your data. Perhaps there is additional colinearity between the tertile variable and some other variable in your model, or perhaps your tertile variable actually only contains 2 levels because the data could not be partitioned into tertiles. Without seeing example data, regression command, and regression output, nothing more concrete can be said.

          Comment


          • #6
            Hello, I am having a similar problem as #1. I have created a category with three states, but when I use i.threestates, I get and two (rather than 1) omitted categories: South Dakota (46) is ommitted, Nevada (32) is included, but the coefficient for Delaware (10) is not included at all. What could explain this?

            Code:
            gen threestates = .
                replace threestates = 1 if stnamebr== "South Dakota"  
                replace threestates = 1 if stnamebr== "Nevada"
                replace threestates = 1 if stnamebr== "Delaware"
                replace threestates = 0 if threestates != 1
                
            randomtag if threestates ==1, count(20) gen(pick)
            
            reg indepvar control1 control2 control3 i.stnumbr if threestates == 1
            
            dataex indepvar control1 control2 control3 stnumbr stnamebr threestates if pick==1

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input float indepvar double(control1 control2) float control3 byte stnumbr str48 stnamebr float threestates
              428.806 .08699999749660492 16 11.583333 10 "Delaware"     1
             365.7428               .087 16 11.583333 10 "Delaware"     1
             52.70919                  0  0 10.583333 32 "Nevada"       1
            70.018585                  0  0 13.416667 32 "Nevada"       1
            70.018585                  0  0 13.416667 32 "Nevada"       1
             77.78766                  0  0 13.416667 32 "Nevada"       1
             77.78766                  0  0 13.416667 32 "Nevada"       1
             90.47867                  0  0 13.416667 32 "Nevada"       1
             90.47867                  0  0 13.416667 32 "Nevada"       1
             377.8177                  0  0 13.583333 46 "South Dakota" 1
             377.8177                  0  0 13.583333 46 "South Dakota" 1
            446.42615                  0  0 13.583333 46 "South Dakota" 1
            446.42615                  0  0 13.583333 46 "South Dakota" 1
              527.094                  0  0 13.583333 46 "South Dakota" 1
             525.9508                  0  0 13.583333 46 "South Dakota" 1
             614.1753                  0  0 13.583333 46 "South Dakota" 1
             614.1753                  0  0 13.583333 46 "South Dakota" 1
             685.7753                  0  0 13.583333 46 "South Dakota" 1
             685.7753                  0  0 13.583333 46 "South Dakota" 1
             685.7753                  0  0 13.583333 46 "South Dakota" 1
            end
            I was actually doing an event study regression (using user-created command eventdd) when I noticed this, and I just changed the example to a regular "reg" command to try and see if I could isolate the problem.


            Last edited by John Singer; 06 Mar 2022, 14:50.

            Comment


            • #7
              Clyde Schechter I also tried
              Code:
                tab threestates if !missing(threestates, regress)
              and got r(111) regress not found. This is on Stata SE 17.0

              Comment


              • #8
                John:
                perfect collinearity between one level of -stnumbr- and another predictor in addition to reference category omitted to protect from dummy trap.
                It seems to be the very same situation described by Clyde's last statement in his reply #5 above.
                Kind regards,
                Carlo
                (StataNow 18.5)

                Comment


                • #9
                  Your control2 variable is constant within state, so it gets omitted. Your variable control1 is not exactly, but almost exactly constant within state--there is just an extremely small difference between the two values of control1 shown for Delaware. This is close enough that Stata's algorithm for dealing with colinearity treats it as if it were an exact colinearity. You can see this more clearly if you run -regress control1 i.stnumbr if threestates == 1-. The output will show you that R2 = 1. So something has to go: you can't have all three states and control1 in the model. Stata chose to keep control1 and drop one of the states. If you prefer to drop control1, just leave it out of the model. Then your model becomes -regress control3 i.stnumbr if threestates == 1-, and that preserves three states (with one omitted as the reference level.)

                  This is, as Carlo notes, an instance of my last paragraph in #5.

                  Comment


                  • #10
                    Dear Clyde and Carlo,
                    Thank you for explaining this so clearly! I was just confused why it didn't show the 0 (omitted) in the regression output. Because it does say "note: control2 omitted because of collinearity.
                    note: 46.stnumbr omitted because of collinearity" but it doesn't say anything about Delaware being omitted, which I thought was strange.

                    Comment


                    • #11
                      When Stata handles a factor variable, i.X, one category of X is automatically omitted as a base category. Because that is routine, no messages are issued about it, and it is simply left out of the outputs, with no row saying (omitted). It is only when Stata omits additional categories that you get a warning message and an (omitted) row. Stata is calling your attention to the unusual situation, but keeping silent in the routine situation.

                      Comment


                      • #12
                        John:
                        as an as aside to Clyde's helpful explanation, you may feel more comfortable with the -allbaselevels- option, that use a row to show the omitted reference category (see the difference in -regress- outcome tabkes with and without this option):
                        Code:
                        . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
                        (1978 automobile data)
                        
                        . regress price i.foreign, allbaselevels
                        
                              Source |       SS           df       MS      Number of obs   =        74
                        -------------+----------------------------------   F(1, 72)        =      0.17
                               Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
                            Residual |   633558013        72  8799416.85   R-squared       =    0.0024
                        -------------+----------------------------------   Adj R-squared   =   -0.0115
                               Total |   635065396        73  8699525.97   Root MSE        =    2966.4
                        
                        ------------------------------------------------------------------------------
                               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                             foreign |
                           Domestic  |          0  (base)
                            Foreign  |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
                                     |
                               _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
                        ------------------------------------------------------------------------------
                        
                        . regress price i.foreign
                        
                              Source |       SS           df       MS      Number of obs   =        74
                        -------------+----------------------------------   F(1, 72)        =      0.17
                               Model |  1507382.66         1  1507382.66   Prob > F        =    0.6802
                            Residual |   633558013        72  8799416.85   R-squared       =    0.0024
                        -------------+----------------------------------   Adj R-squared   =   -0.0115
                               Total |   635065396        73  8699525.97   Root MSE        =    2966.4
                        
                        ------------------------------------------------------------------------------
                               price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
                        -------------+----------------------------------------------------------------
                             foreign |
                            Foreign  |   312.2587   754.4488     0.41   0.680    -1191.708    1816.225
                               _cons |   6072.423    411.363    14.76   0.000     5252.386     6892.46
                        ------------------------------------------------------------------------------
                        
                        .
                        Kind regards,
                        Carlo
                        (StataNow 18.5)

                        Comment


                        • #13
                          Dear Carlo and Clyde,

                          Thank you for your explanations. I do indeed like the -allbaselevels- option because it'll give me the confidence that I've specified the regression correctly. As an aside, I'd checked back to see if I'd gotten responses, and I feel like in the past sometimes there'd be a red number near the top of the Stata webpage near the name. I noticed these responses because they subject was bolded when I clicked on "Recent Posts," but there wasn't any kind of red number to indicate I'd gotten a response. Is this a new change or am I misremembering? Aside from maybe email notifications, what's the best way to check back to see if I got a response?

                          Comment


                          • #14
                            #13 Subscribe to a thread to get email notifications of any posts to that thread. Select the button +SUBSCRIBE

                            Comment

                            Working...
                            X