Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Estimation option "noconstant" - Why?

    Dear all,

    My work consists on the Paper of Jordà, Schularick and Taylor (2016) - Sovereigns vs. Banks: Credit Crisis and Consequences which is available under
    https://academic.oup.com/jeea/articl...4/1/45/2319810

    I am currently trying to retrace the paper and came across with the estimation option "noconstant". I have read that this option suppresses the constant term of a regression (e.g. an OLS Regression). What I do not understand is, why this is neccessary? Or what implications on the regression it has, to suppress the constant term?

    Maybe someone could help?

    Thank you very much in advance

  • #2
    It is seldom used. In some situations, the science theory behind the project implies that when y = 0 if x = 0, so that the correct linear relationship is y = mx, not y = mx + b (b different from 0). Of course, due to sampling variation, the estimated value of b from an ordinary -regress y x- will not necessarily be exactly 0. So the noconstant option exists to force the constant term to be zero.

    If the real world data generating process is not of the form y = mx (i.e. the scientific theory is wrong, or misapplied) the resulting no-constant regression will usually fit the data very badly. But if the scientific theory is being correctly applied, then the no-constant regression will be a better model of the data.

    Unless you are analyzing data from a domain where there is a strong a priori reason to assume that the linear relationships are of the form y = mx, not y = mx + b, you should not use the no-constant option. If it was used in that paper, you should try to understand on what (non-statistical) basis they concluded that their relationship was necessarily y = mx, without b. The authors probably explain that somewhere in the paper. If they don't, and if you can't figure it out based on your general knowledge of the field, I would either ask a local expert in the subject matter, or contact the authors directly.

    Comment


    • #3
      Thank you Clyde for your response!

      Hm, I really can´t come up with any idea why the authors suppressed the constant term. And they didn´t mentioned it in their paper. I will try to contact one of the authors and will let you know if I have an answer!
      Thanks a lot

      Comment


      • #4
        Just to close this thread:

        One of the authors replied, that this is to avoid perfect collinearity. It is neccessary, because some of my independent variable in the model are perfectly collinear.

        Comment


        • #5
          Perfect collinearity might occur if, say, a categorical variable has 4 categories and you are trying to include all 4 in the model plus the constant. Usually you just drop one of the dummies but the authors may have had some reason for not doing that. Or, there may be some other reason they had this problem, but I don't know what it would be.

          If you Google

          regression through the origin

          you get a lot of hits. These articles argue that RTO is sometimes appropriate:

          https://online.stat.psu.edu/~ajw13/s...hru_origin.pdf

          https://rpubs.com/aaronsc32/regressi...ugh-the-origin

          Other articles argue that it is rare to need or want RTO and that it is usually bad to do so; and that even if RTO seems justified, including a constant term won't hurt. It can always be estimated as zero.

          I never use RTO myself.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 19.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Silvia.
            as an aside to Richard's helpful reference, Kit Baum's https://www.stata.com/bookstore/mode...metrics-stata/ coverd the issue you're interested in at paragraph 4.3.5 (pages 81-82).
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Surprising myself, I was actually able to find Kit's book on my bookshelf pretty quickly. He says you usually should not use noconstant. He does give an example of where you might want to do it -- although even there I think it is just a matter of how you would like to see the model parameterized.
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 19.5 MP (2 processor)

              EMAIL: [email protected]
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                Thank you very much for all the information about that topic!

                Comment


                • #9
                  Another reference: Buis (2012) Stata tip 106: With or Without Reference. The Stata Journal, (12)1: 162-164. http://www.stata-journal.com/article...article=st0250
                  ---------------------------------
                  Maarten L. Buis
                  University of Konstanz
                  Department of history and sociology
                  box 40
                  78457 Konstanz
                  Germany
                  http://www.maartenbuis.nl
                  ---------------------------------

                  Comment


                  • #10
                    Hi,
                    I'd like to know if comments from Clyde Schechter's post and Christopher F. Baum's book about "noconstant" apply also for nonlinear models like logit or probit?

                    Comment


                    • #11
                      Yes, the comments apply equally to logit, probit and other similar generalized linear models. They are all built on a linear backbone, with some non-linear function linking it to the outcome variable. That doesn't change the reasoning about the no-constant option.

                      Comment


                      • #12
                        This thread is a good example why we should not ask vague and overly general questions, and we should not present our interpretation of what happened, but rather some leads as to what actually happened (e.g., what was typed at Stata and what Stata returned).

                        Original poster asked a vague and overly general question, there was no regression output presented, no actual data.

                        Clyde interpreted the question as "why do we do regression through the origin" and gave a nice and accurate answer, to this question, "why do we do regression through the origin".

                        And then it turned out that we are not dealing with a regression through the origin, but we are dealing with a constant that is implicitly defined by a combination of variables. In Stata terms, it turned out that we are not dealing with the -noconstant- option, but with the -hasconstant- option.

                        So at the great level of vagueness and generality that you are asking your question: Yes, what is said about the constant on this thread also applies to probit and logit. Probit and Logit are y = 1[a+bx + e > 0], so if you omit a where in fact a is different from 0, you will be biasing the estimate of your b.

                        But more productive way to get a useful advice would be to explain why exactly (with what purpose) you want to omit the constant in your logit/probit?




                        Originally posted by Brian Yalle View Post
                        Hi,
                        I'd like to know if comments from Clyde Schechter's post and Christopher F. Baum's book about "noconstant" apply also for nonlinear models like logit or probit?

                        Comment

                        Working...
                        X