Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is it necessary to control for regional fixed effects AND cluster at the regional area in fixed effect panel analysis?

    I've noticed in my reading of fixed effects panel studies a lot of researchers will control for location fixed effects and also cluster standard areas at this location. I was wondering if both are necessary? For example, in the context of local unemployment and health in the US, my understanding is that the state dummies in a fixed effects logistic regression control for any time-invariant state-level factors that are correlated with both state economic conditions and health. So, what does clustering standard errors at the state level do in such an analysis?

    This is something I see a lot and am interested in. A better example can be found in the really interesting study below:

    The Great Recession and Mothers’ Health Janet Currie, Valentina Duque, Irwin Garfinkel https://academic.oup.com/ej/article/...8/F311/5077911

    Kindest regards,

    John

  • #2
    John:
    clustered robust standard errors (if you refer to panel data regression) take both heteroskedasticity and within panel serial correlation of the epsilon residuals into account.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Yes, it is necessary to cluster at the level of your main fixed effects. This is the consensus in the current econometrics literature, and a key reference in this matter is
      Kezdi, G. (2003). Robust standard error estimation in fixed-effects panel models. Available at SSRN 596988.

      Comment


      • #4
        Thank you both for your feedback, and sorry it took me so long to respond, I must have missed the notification in email.

        I guess that, even after reading the article linked above, I'm unclear what the clustering at the regional level and controlling for regional fixed effects in a fixed effects analysis of regional unemployment on health are doing.

        I assume that controlling for regional fixed effects is controlling for any region-specific effects on health, but then what is clustering at the regional level doing?

        Is this clustering taking within region correlation in the effect of unemployment on health outcomes into account? Like, in case the effect of unemployment on health is bigger for everyone who lives in one specific region?

        So then, controlling for region controls for health that would be worse by virtue of the region you're in, and clustering for region, controls for region-specific effects of regional unemployment on health?

        Maybe you have explained this and I just don't understand, sorry, could you explain why clustering and regional specific controls are both included in this kind of an analysis?

        Kindest regards,

        John

        Comment


        • #5
          John:
          - regional fixed effect has to do with the ui component of the composite error term ( ie the panel-wise effect, if existing);
          - clustered standard errors have to do with the eit component of the composite error term (ie, the systematic error).
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Hi Carlo,

            So, if I'm understanding you correctly, regional fixed effects add the effect that the region itself has to the error term (if it has an effect) while clustered standard errors at the regional level add the common effect on the residuals of being in that area to the error term?

            Is that right?

            Thanks,

            John

            Comment


            • #7
              John:
              - in panel data regression equation, the error is composed of two items: ui and epsilonit.
              - ui represent the panel-wise effect (if any), which is constant within the same panel. The -fe- machinery allows a weak endogeneity between ui and the vector of regressors (the same is not allowed for OLS and -re- specification);
              - as far as the standard errors clustered on -panelid- are concerned, clustering takes both heteroskedasticity and serial correlation of the epsilonit (ie, the sytematic error of the regression that is both -id-and -t- varying). Note that the epsilonit component of the composite error term is the same error that you find in OLS equation, with the relevant difference that OLS included one wave of data only (hence, epsilon is only -id- varying).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Carlo,

                I think I get it,

                In my fixed effects analysis of the effect of regional unemployment on individuals self-rated health, which includes a dummy variable for the region the individual lives in, and also clusters at the region the individual lives in, I think:

                1) The dummy controls for the effect of the region the individual lives in on the relationship between regional unemployment and health in the regression (i.e. for example if people in rural areas have better health more generally)

                2) Clustering at the regional area reflects the possibility of serial correlation or heteroskedasticity in the residuals/error term by the region the person is in?

                Is that right?

                Thanks for your help!

                John

                Comment


                • #9
                  John:
                  while I share your expalnation of point 2), to reply helpfully to your point 1) I need to know whether you refer to -regress- or -xtreg, fe- (I assume that you have a continuous regressand).
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment


                  • #10
                    Hi Carlo,

                    Sorry you are right, I am thinking of a linear probability model with a continuous "percentage unemployed" predictor, where I also include controls for dummies for the region of location where the people in the sample live, and a binary outcome of "poor self rated health", using xtreg, fe as per the example I share in another post here: https://www.statalist.org/forums/for...ded-in-any-way

                    Thank you for your help,

                    Jonathan

                    Comment


                    • #11
                      John:
                      sticking with your query 1), if you're using -xtreg,fe-, retrieving the story of your previous post, it seems to me that:
                      - you -xtset- your panel dataset using a -panelid- (that differs from -i.region-) and a -timevar-;
                      - that -panelid- defined your fixed effect;
                      - you added -i.region- to your set of predictors: this choice means that you're interested to investigate how different zones of Ireland contribute to explain the within variation of the conditional mean of your regredsand when adjusted for the remaining predictors. That said, if you want to obtain a fixed effect for this predictor, too you shoud better switch to the community-contributed module -reghdfe-;
                      - you clustered your standard errors on -region-: this choice means that you consider the systematic error concerning panels living in the same area to be more similar than the one of those living in different areas.
                      In addition:
                      - clustering takes both autocorrelation and heteroskedastcity into account;
                      - 29 clusters (if I am not mistaken) can be considered enough by some reviewers or poored by other reviewers.

                      As an aside, your within R_sq (that should be considered for -fe- specification) seems pretty low.
                      I would double-check whether your model is correctly specified.
                      Last edited by Carlo Lazzaro; 11 Feb 2021, 09:50.
                      Kind regards,
                      Carlo
                      (Stata 19.0)

                      Comment


                      • #12
                        Carlo thank you for your time in providing such an informative response, I now am much clearer on the above.

                        I'll admit I hadn't noticed that the within r-squared was that low, is that too low? What does that mean for the model? I think the fault lies in limited movement in the predictor, as unemployment predominantly moves by 0.01 percentage points in my analysis, and infrequently by a full 1.00 percentage point. Also my sample size is quite low at 681 individuals....

                        With my best wishes,

                        John

                        Or could this be the fault of applying a linear probability model?

                        Comment


                        • #13
                          John:
                          yor model reported:
                          Code:
                          R-sq:  within  = 0.0502
                          which is low.

                          It might also be that you have so many areas and the they do not play such a relevant role in explaining variation in the regressand, other things being equal.
                          As usual, the yardstick is the literature in your research field: if low R-sq are frequent/present your model is probably informative.
                          That said, I would however run a test on your regression results adding fitted and sq_fitted to the set of predictors (see -linktest- about the undelying theory).
                          Kind regards,
                          Carlo
                          (Stata 19.0)

                          Comment


                          • #14
                            Originally posted by Carlo Lazzaro View Post
                            John:
                            It might also be that you have so many areas and the they do not play such a relevant role in explaining variation in the regressand, other things being equal.
                            I study the effect of increases in local unemployment on health during a time of recession, could this then reflect that no individual area is increasing in unemployment and explaining the variation in the regressand, but that they are all increasing in unemployment and sharing this effect due to the time period studied?

                            Comment


                            • #15
                              John:
                              I would say that almost all areas do not explain the within (as we are talking about fixed effect) variation in the regressand.
                              Taking a look once more at your results my amateur's opinion as far as your research field is concerned, is that you regression model bets so much on geographic areas but possibly skips other predictors (and this may explain the low within R-sq).
                              There's another more substantive issue that may put your regression model at risk of endogeneity (reverse causation): while it's true that unemployment can explain variation in individual self-reported health state (other thing being equal), it is also possible that a poor health state explains variations in the unemployment rate (other thing being equal): I woud discuss this issue with your teacher/supervisor/mentor.
                              Kind regards,
                              Carlo
                              (Stata 19.0)

                              Comment

                              Working...
                              X