Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pooled OLS vs Panel approach

    Hello,
    Could you please shed some light on the difference between Pooled and Panel regression model? and when I can't use pooled approach.

  • #2
    When you have panel data, with more than one observation per panel, it will usually be the case that the observations in the data set are not all independent, because traits of the panel that are not represented by other variables will typically cause some within-panel correlation (or, in some special circumstances negative correlation). In that case, standard errors (and tests based on them) calculated in a pooled regression model will be incorrect.

    So, in general, if you have panel data you should use a panel regression model. Pooled analysis is most suitable when each observation is independent of any other.

    That said, sometimes when you perform a panel regression, you find that the actual extent of within-panel correlation of observations is negligibly small. In that case, if you prefer, you can go back and just use a pooled regression model for that. Also, if you are not interested in within-panel relationships, and just want to understand relationships between a panel's mean outcome and the mean values of the panel's predictor variables, you can calculate those means, reducing the panel data set to one observation per panel, and then do pooled regression. (Of course, this really only makes sense for continuous outcome variables, and in that case it is probably easier to just use -xtreg, be-, which does all that for you automatically, than to go through explicitly coding the calculations of all the variables' means.)

    Comment


    • #3
      Jo:
      an in-depth coverage (and much more else) of Clyde's excellent insight is reported in: http://www.stata.com/bookstore/micro...ata/index.html
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        best explanation, easily understand and we can teach to students in simple words but exact meaning

        tnx


        Originally posted by Clyde Schechter View Post
        When you have panel data, with more than one observation per panel, it will usually be the case that the observations in the data set are not all independent, because traits of the panel that are not represented by other variables will typically cause some within-panel correlation (or, in some special circumstances negative correlation). In that case, standard errors (and tests based on them) calculated in a pooled regression model will be incorrect.

        So, in general, if you have panel data you should use a panel regression model. Pooled analysis is most suitable when each observation is independent of any other.

        That said, sometimes when you perform a panel regression, you find that the actual extent of within-panel correlation of observations is negligibly small. In that case, if you prefer, you can go back and just use a pooled regression model for that. Also, if you are not interested in within-panel relationships, and just want to understand relationships between a panel's mean outcome and the mean values of the panel's predictor variables, you can calculate those means, reducing the panel data set to one observation per panel, and then do pooled regression. (Of course, this really only makes sense for continuous outcome variables, and in that case it is probably easier to just use -xtreg, be-, which does all that for you automatically, than to go through explicitly coding the calculations of all the variables' means.)

        Comment

        Working...
        X