Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • F-value of model drops severly after including time dummies and trend

    Dear Statalists,

    Im regressing a panel model with 160 units of observations over a time period of 30 years (annual data). To due unobserved effects and nonstationarity, Im using a first difference estimation method. When only my explanatory variables are included in one regression (without time dummies) the F-value corresponds to 68. If I add time dummies and a trend (constant) it drops to 11 and most of my controls become insignificant. Can somebody give me a possible explanation? I would expect the joint significance to increase if additional variables are added. Additionally in some of the regressions only three of the 29 time dummies are significant.

    Hope for your help

    CP
    Last edited by Christopher Parker; 30 Apr 2014, 15:32.

  • #2
    In general, if you add worthless variables to a model, the overall P value for a global test statistic can get worse, as can the test stats for individual coefficients. Think of the crummy variables as diluting the effects of the good variables. That is part of the reason you don't just include all 500 variables that happen to be in your data set. For a simple explanation see pp. 4-5 of

    http://www3.nd.edu/~rwilliam/xsoc63993/l41.pdf

    What you may be thinking of is that, if you add more variables to a model, R^2 will go up, or at worst stay the same. It can't go down. But that doesn't mean that the P values will get better. They can get worse if the added variables do not produce significant improvements in R^2.
    -------------------------------------------
    Richard Williams, Notre Dame Dept of Sociology
    StataNow Version: 18.5 MP (2 processor)

    EMAIL: [email protected]
    WWW: https://www3.nd.edu/~rwilliam

    Comment


    • #3
      Thank you. The problem is that not just the standard errors but also the coefficients of the variables of interest change significantly. On one hand, one could argue that the dummies are not relevant for the model as they are insignificant and the overall F/p-value drops. On the other hand they change the coefficients of the other variables, which indicates that they are relevant for the model. Is there a common explanation for both effects? Is there some sort of best practice on how to proceed here? The results are more favorable without the dummies, but I do not want to be misguided by my wish for better results.
      Last edited by Christopher Parker; 01 May 2014, 08:52.

      Comment


      • #4
        When you say that the coefficients change "significantly", I imagine that you mean "substantially"; else what significance test results are you referring to? It's best not to equivocate between technical and informal senses of the word "significant".

        Wording aside, I may be missing something, there seems to be no unusual problem here, but another facet of the broad problem Richard was explaining.

        Unless exceptionally your predictors are uncorrelated, you should expect the coefficients to change as predictors come in and out of the model. This is generic to the problem of predictor selection: not only do you have to choose which predictors to include as being important in some sense, that choice must be made knowing that there are relationships among your predictors.

        I don't think we can help much with your model choice. There should be some science to what makes sense that we can't impute. It's not even agreed by researchers that predictors must all be flagged as significant in the usual sense. You might need to include a predictor for comparability with previous work, as one of a bundle that should all be included, and so forth.


        Comment


        • #5
          I agree with Nick; the changes in coefficients are not necessarily surprising. When you add more variables, the standard errors can get larger, and the coefficients can fluctuate more. Part of the reason this can occur is because as you add more variables you can get more multicollinearity among your predictors, making it harder to estimate the effects of each variable separately. As a general rule, there are potential consequences to adding junk to a model, and these can include bigger standard errors and volatile coefficient estimates.

          Of course, a lot of times you may want to include junk just to show that the variables really don't matter. If you have a big enough N and/or the effects of other variables are large enough, there may be little harm to including non-significant variables. But, if you are just throwing in variables for the heck of it (e.g. who knows, maybe they do matter) then you may wind up paying a price. You usually can't afford to just toss in anything and everything that might be in your data set.

          In your case, I don't know how theoretically important or plausible all these time dummies are. But, it sounds like you are adding 30 variables here, and in general I'd be hesitant to add 30 variables if I was only acting on a hunch. (On the other hand, if everybody else doing your kind of work always adds time dummies, then, as Nick says, maybe you need to do so too).

          You might also look to see if the confidence intervals for your new coefficients include the coefficients you were getting before.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 18.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Thank you, you have been very helpful. But I still have one question: the standard errors of my explanatory variables do not substantially increase by including year dummies. Does this make your explanation for my case invalid?

            Comment


            • #7
              Run an experiment like the following

              Code:
              sysuse auto, clear
              reg mpg weight trunk length 
              set seed 2803 
              gen foo1 = runiform()
              gen foo2 = runiform()
              reg mpg weight trunk length foo?
              This first fits a mediocre model, and then generates two extra predictors which are just random noise. Adding the useless predictors pulls down F. F is necessarily sensitive to changes in the mean squares it is calculated from, which in turn are sensitive to the degrees of freedom in each case. That's entirely consistent with the standard errors of the variables, by which you mean their coefficients, being about the same.

              Your situation is different in some ways, as you are adding (it seems) several extra predictors, some of which may be better than useless, but the larger point remains that adding lots of extra predictors often does not help at all.

              To get more detailed advice you really should show your commands and output.

              Comment

              Working...
              X