Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effect of constrained coefficients on t-stat of unconstrained coefficient in OLS

    Hi All,

    I am in the process of responding to an R&R at a premier journal in my area. In the course of this, I want to show that a certain result obtains when the coefficients are constrained to be equal to the coefficients from another unconstrained OLS regression. Everything worked as expected, except for the t-stat exploding on the single, unconstrained variable in the constrained model. Below, is an example of what happened.

    Likely, there is literature somewhere noting that this is going to happen and explaining why, but I can't find it and I really want to cite something to back up my claim and my example. Also, note that the size of the sample is 11,361;10 more degrees of freedom from constraining 10 coefficients seems unlikely to boost the t-stat by more than 10X!

    Any suggestions?

    Thanks.
    Dave
    Unconstrained Model Coefficients constrained equal to
    estimates from unconstrained model
    Theory Variable being tested
    DQ -7.724*** -7.724***
    (-3.81) (-40.83)
    Control Variables
    Vol_EPS 0.944*** 0.944
    (18.87) (constrained)
    Growth 0.506** 0.506
    (2.27) (constrained)
    ROA -17.476*** -17.476
    (-11.33) (constrained)
    Log(AF) -2.382*** -2.382
    (-9.29) (constrained)
    Log(at) 2.518*** 2.518
    (20.21) (constrained)
    Restructure -0.663 -0.663
    (-1.30) (constrained)
    MA -0.491** -0.491
    (-1.99) (constrained)
    SI 18.982*** 18.982
    (4.11) (constrained)
    Volret 15.538*** 15.538
    (5.63) (constrained)
    Log(NSEG) -0.928*** -0.928
    (-3.54) (constrained)
    Constant 4.502** 13.582
    (2.22)
    Ind and Year FE YES YES (constrained)
    N 11,361 11,361
    Adj. R-square 0.266 NOT REPORTED

  • #2
    What you've done here is you have taken all of the variance explained by the unconstrained model and, by the imposition of those constraints, attributed all of that to the single unconstrained variable DQ. Another way to look at the same thing, you have changed a regression of your outcome on DQ + 10 other variables into a regression of outcome2 on just DQ, where outcome2 is outcome - 0.944*Vol_EPS - 0.506*Growth ... -0.928*ln(NSEG). And, crucially, that definition of outcome2 was chosen precisely so as to minimize the residual variance when it is regressed on DQ alone.

    Either way of looking at it implies that your result is unsurprising. With the residual variance squashed down to a minimum, the standard error declines in proportion to that, with a corresponding blow-up of the t-statistic.

    I cannot point you to a reference about this specific situation. I have never seen anybody do this before, and, honestly, I cannot imagine why a reviewer has asked for it. Be that as it may, the concepts underlying what happened are just a basic understanding of explained variance, residual variance, and how the t-statistic relates to the latter. Those can be found in any basic regression textbook, and probably could, in any case, be asserted without specific citation.

    Comment


    • #3
      Hi Clyde,

      Thank you very much for getting back so quickly. I think that what you are saying sounds good. If this is the explanation, then it can be proved by running a regression in two stages. First, run the unconstrained model without DQ and collect the residuals (or the constrained model as the residuals should be identical). Second, run these residuals in the second stage on DQ, alone. But, I don't think doing this will attribute all of the explanatory power of the unconstrained model to DQ as in both cases the other variables are accounting for their portions of the variance of the dependent variable; for DQ to retain the same coefficient, I think it reflects only the dependent variable's variance conditional on the other independent variables. That is, if a two stage regression were run, the residuals from the first stage would only retain whatever variance remains after accounting for the effects of the other independent variables; I believe the residuals would be orthogonal to the independent variables included in the first stage.

      Another explanation, consistent with some other things I saw just today is that the standard error of each coefficient is a function of the covariances of all the independent variables, which OLS estimates simultaneously with the regression coefficients. What we may have done with our constraints is to have removed the effects of these covariances, which could increase or decrease the standard error of a coefficient, from DQ's conditional standard error. If the overall impact of these was to increase DQ's standard error, then removing them could have exactly the effect we see.

      I'm basing this possibility on comments at https://stats.stackexchange.com/ques...gression#97207, where the point is established about other independent variables' covariances increasing or decreasing the standard error of another coefficient, and https://www.statalist.org/forums/for...=1710712760387, where in footnote 1, the formula for coefficients' standard errors is fully written out.

      Of course, perhaps I'm completely wrong. Any comments and suggestions are appreciated.

      Thanks,
      Dave

      Comment


      • #4
        But, I don't think doing this will attribute all of the explanatory power of the unconstrained model to DQ as in both cases the other variables are accounting for their portions of the variance of the dependent variable; for DQ to retain the same coefficient, I think it reflects only the dependent variable's variance conditional on the other independent variables. That is, if a two stage regression were run, the residuals from the first stage would only retain whatever variance remains after accounting for the effects of the other independent variables; I believe the residuals would be orthogonal to the independent variables included in the first stage.
        This is correct. Doing a "two stage" regression where the first stage excludes DQ will not produce the same results as you got. It is important to understand, and I probably should have made this explicit in my response, that the coefficients of any variable(s) in a regression, as well as the proportion of variance explained by a subset of the variables, must always be understood to be conditional on all other variables (if any) in the model. Indirectly I did say that: I referred to what you did as taking "all of the variance explained by the unconstrained model" and attributing it all to DQ. Notice that I did not say "all of the variance explained by the other variables in the model."

        Comment

        Working...
        X