Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • What does ", noconstant" do to standard error?

    I tried to suppress the constant in a regression, the coefficient is correct but the standard error appears to be way off. Here is an example:


    Code:
    reg mpg g1
    mpg Coef. Std. Err. t P>t [95% Conf. Interval]
    g1 -.2985075 4.240777 -0.07 0.944 -8.763134 8.166119
    _cons 21.29851 .7219978 29.50 0.000 19.85739 22.73962

    Code:
    reg mpg g1, nocon
    mpg Coef. Std. Err. t P>t [95% Conf. Interval]
    g1 21 15.51399 1.35 0.180 -9.957681 51.95768

    Why causes the difference in SE? And how to get a correct SE while suppressing the constant?

    Thanks!

  • #2
    They are entirely different models so it is not strange that the coefficients and standard errors are different. It looks like it doesn't make much sense to restrict mpg to be 0 when g1 is 0.

    You can visually inspect what's going on by running these lines.
    Code:
    scatter mpg g1 || lfitci mpg g1
    scatter mpg g1 || lfitci mpg g1, estopt(nocons)

    Comment


    • #3
      Thank you so much! The visual inspection works great!

      Could you help me understand a little more on what you mean by "It looks like it doesn't make much sense to restrict mpg to be 0 when g1 is 0"? Is this how "nocons" operate?

      I really thought both models are the same, except the noconstant one added the coefficent to the constant.

      Comment


      • #4
        Removing a coefficient is another way of saying that it is set to 0
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          With regards to model specification, the difference is this:

          y = a + bx + e
          y = bx + e

          where a is a constant, and e is an error term.

          The first model has a constant, the second one doesn't. In the first case, the expected value of y is a + bx, in the second case it's bx. This this means that in the regression without a constant, the expected value of y when x is 0, is 0. In other words, the regression line is restricted to run through the origin if you were to plot it. In the model with a constant, the expected value of y when x is 0 is estimated to fit the data best. In most cases, this makes the most sense.

          Somebody please correct me if I'm wrong, but the only reason to restrict the model with the nocons option is when you know beforehand about the data that y should be 0 when x is 0, or when you find that in the model with the constant, the constant is insignicantly different from 0.

          Comment


          • #6
            Maarten Buis Thanks!

            Wouter Wakker Thank you for the explanation! It definitely makes sense. I guess the estimated standard error in the 2nd model is inaccurate then.

            Comment


            • #7
              It's perhaps worth underlining that many of the simplest laws of physical science that you did or didn't appreciate in high or secondary school are of the form

              response = constant * stimulus

              Even if the functional form is of that kind, it seems rare that measurements ever yield negative responses when the stimulus is nearly zero, as should happen according to the implied error structure.

              Some of those laws turn out to be just empirical approximations not that far away from anything in say social, environmental or medical statistics, so just as phenomenological and not based on anything deeper.

              I often encounter power functions (some say "power laws") of the form y = ax^b with good reason to suppose that y -> 0 as x -> 0. But even there it is often best procedure to work with

              log y = log a + b log x

              so the noconstant option isn't needed there either.

              In practice when that option it is used we may be confident that it's the right form in principle. But when measured values are far from zero, there can be different views on what makes sense. Fitting a regression with a constant allows the data to signal that the intercept is close to zero whenever that is the case.

              Comment


              • #8
                It makes sense to exclude the constant if all variables in the model (both y and x) have a mean of zero, e.g. because the mean has been removed in a prior step. If you were to include a constant in that model, it would turn out to be zero anyway.

                It is also meaningful to exclude a constant if you include a full set of dummy variables (including the base category), e.g. a dummy variable for foreign cars and one for domestic cars, which might occasionally help with the interpretation of the coefficients. The model itself does not change. It is just a reparameterization.
                https://www.kripfganz.de/stata/

                Comment


                • #9
                  Originally posted by Peter Li View Post
                  I guess the estimated standard error in the 2nd model is inaccurate then.
                  No. I wouldn't call it inaccurate. It's large because there's a high degree of uncertainty about the parameter estimate because your specified model is a very poor fit to the data.

                  Forcing the regression line through the origin is a terrible model. The high SE for the slope is telling you that, given this constraint, there are a wide range of values of the slope that give a similarly poor fit.

                  Comment


                  • #10
                    Thanks everyone for the comments/advice!

                    Sebastian Kripfganz The second scenario is exactly the reason I'm using noconstant for! Do you know any better ways to get the coefficients of the omitted categories without sacrificing SE? I'm trying to plot all coefficients with CI. Thanks!

                    Comment


                    • #11
                      if you are using the full set of dummy variables, then, assuming you are using -regress-, you should be using the "hascons" option, not the "nocons" option - note, however, that this does not appear to be consistent with what you wrote above

                      Comment


                      • #12
                        I do not see the concern about standard errors in this case. Depending on how you want to interpret the coefficients, you either choose the first or the second of the following specifications. Standard errors are accurate in both cases.
                        Code:
                        . sysuse auto
                        (1978 Automobile Data)
                        
                        . reg price mpg i.foreign
                        
                              Source |       SS           df       MS      Number of obs   =        74
                        -------------+----------------------------------   F(2, 71)        =     14.07
                               Model |   180261702         2  90130850.8   Prob > F        =    0.0000
                            Residual |   454803695        71  6405685.84   R-squared       =    0.2838
                        -------------+----------------------------------   Adj R-squared   =    0.2637
                               Total |   635065396        73  8699525.97   Root MSE        =    2530.9
                        
                        ------------------------------------------------------------------------------
                               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                 mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
                                     |
                             foreign |
                            Foreign  |   1767.292    700.158     2.52   0.014     371.2169    3163.368
                               _cons |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
                        ------------------------------------------------------------------------------
                        
                        . reg price mpg ibn.foreign, hascons
                        
                              Source |       SS           df       MS      Number of obs   =        74
                        -------------+----------------------------------   F(2, 71)        =     14.07
                               Model |   180261702         2  90130850.8   Prob > F        =    0.0000
                            Residual |   454803695        71  6405685.84   R-squared       =    0.2838
                        -------------+----------------------------------   Adj R-squared   =    0.2637
                               Total |   635065396        73  8699525.97   Root MSE        =    2530.9
                        
                        ------------------------------------------------------------------------------
                               price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                                 mpg |  -294.1955   55.69172    -5.28   0.000    -405.2417   -183.1494
                                     |
                             foreign |
                           Domestic  |   11905.42   1158.634    10.28   0.000     9595.164    14215.67
                            Foreign  |   13672.71   1481.406     9.23   0.000     10718.87    16626.55
                        ------------------------------------------------------------------------------
                        Edit: Rich is right. For the coefficients and standard errors, using hascons or nocons does not make a difference if you have a full set of dummy variables. But it matters for other quantities such as the R-squared.
                        Last edited by Sebastian Kripfganz; 11 Nov 2020, 14:48.
                        https://www.kripfganz.de/stata/

                        Comment


                        • #13
                          May I please follow up with a questions:

                          Should I use noconstant after a standarize a variable to have mean 0 and standard deviation 1? It seems that noconstant does something strange to the standard errors. Namely, which standard errors are correct? Please see the example below:



                          Code:
                          . sysuse auto,clear
                          (1978 Automobile Data)
                          
                           
                          . egen std_price = std(price)
                          
                          . sum std_price
                          
                              Variable |        Obs        Mean    Std. Dev.       Min        Max
                          -------------+---------------------------------------------------------
                             std_price |         74   -4.83e-10           1  -.9744909   3.302511
                          
                          . reg std_price foreign,r
                          
                          Linear regression                               Number of obs     =         74
                                                                          F(1, 72)          =       0.20
                                                                          Prob > F          =     0.6577
                                                                          R-squared         =     0.0024
                                                                          Root MSE          =     1.0057
                          
                          ------------------------------------------------------------------------------
                                       |               Robust
                             std_price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                               foreign |   .1058685   .2379326     0.44   0.658    -.3684415    .5801786
                                 _cons |  -.0314744   .1461973    -0.22   0.830    -.3229134    .2599646
                          ------------------------------------------------------------------------------
                          
                          . reg std_price foreign,r nocon
                          
                          Linear regression                               Number of obs     =         74
                                                                          F(1, 73)          =       0.16
                                                                          Prob > F          =     0.6910
                                                                          R-squared         =     0.0017
                                                                          Root MSE          =     .99917
                          
                          ------------------------------------------------------------------------------
                                       |               Robust
                             std_price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                               foreign |   .0743941   .1864285     0.40   0.691    -.2971573    .4459455
                          ------------------------------------------------------------------------------

                          Comment

                          Working...
                          X