Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cluster using xtreg or vce

    Hello,

    I am analysing a dataset to understand the relationship of a variable ind on the outcome y, which may include clustering at state-level. I am confused about the difference between using xtreg re and xtreg re vce(cluster state). When I only use xtreg re (see below) the SEs are smaller than the adjusted OLS with cluster-robust SEs using vce(cluster state), which I understand should be the opposite? here's my code:

    regress y ind smoke i.edu i.inc sex years, vce(cluster state)
    xtset state
    xtreg y ind smoke i.edu i.inc sex years

    Should I be including vce(cluster state) again after my xtreg?
    NOTE this is not panel data, it is cross-sectional.

    Thank you,
    Hania
    Last edited by Hania ElBanhawi; 04 Jan 2022, 08:50.

  • #2
    Hania:
    welcome to this forum.
    Are you dealing with a cross-sectional or panel datsets? Only the latter needs -xtset-ting your data and then go -xtreg-.
    Conversely, if your dataset is cross-sectional, you should go -regress- (assuming that you regressand is continuous).
    In addition, both commands allows clustered standard errors (which is in fact clustered-robust under -xtreg-); however, you shoud have at least 30 clusters to make it works properly.
    The usual aside is to read and act on the FAQ when posting.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Hello Carlo,

      Thank you for your response! I am dealing with a cross sectional dataset with clustering (50 US states).
      Thank you as well for the warm welcome and the pointer to the FAQ's.

      Hania

      Comment


      • #4
        Hania:
        you should go -regress- with standard errors clustered on US States:
        Code:
        regress <depvar> <indepvars> <potential_controls>, vce(cluster US_states)
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Just to clarify: As a general rule, it is permissible to use random effects estimators when you have a cluster structure that is not a panel data structure. In fact, the random effects variance-covariance matrix is a bit more believable because there cannot be serial correlation as in a panel data setting. But if one does this, it is only fair to compare cluster-robust standard errors in both cases. In my experience, using vce(cluster id) for OLS but the nonrobust standard errors for RE leads to just what Hania found: the nonrobust standard errors from RE are smaller. It doesn't have to be that way, but assuming all of the GLS assumptions are true often biases the standard errors downward. So xtreg y x1 ... xK, re vce(cluster id) should be used with panel data or other clustering.

          Having said that, if you have a large cross section with not so many states -- in the U.S., G = 50 states -- then you shouldn't use random effects. Ideally, you would include state fixed effects, but if you're studying a policy that changes only at the state level then you can't include stage fixed effects. In the end, you should use Carlo's suggestion. But be cautioned that N > 30 might not be enough for clustering to work well if you have large group sizes. How many individuals do you have per state, on average?

          Comment


          • #6
            Thank you Carlo and Jeff for your insights!

            I am using a cross-sectional dataset for just one year, with the average cluster size being 2,955.7. I have 50 states - would you say that's not enough to use random effects?


            Just so I know (even if I'm not using RE here) you mention xtreg x1 ... xK, re vce(cluster id) should be used... should vce(cluster id) always be added to xtreg? is that the SE adjustment?

            Thank you again

            Comment


            • #7
              Hania:
              1) with, on average, 2,955.7 observations per cluster, I would go -regress- with standard errors clustered at state level;
              2) under -xtreg-, the -vce(cluster clusterid)- option for SE takes heteroskedasticity and/or autocorrelation into account. In your case (assuming that you want to apply an -xt- command to a cross-sectional dataset), I would go -vce(cluster clusterid)-.
              Kind regards,
              Carlo
              (StataNow 18.5)

              Comment


              • #8
                That makes sense, thank you Carlo!
                In general, is there a number of individuals per cluster, or a number of clusters (or a ratio between the two) that makes RE contraindicated?

                Thanks again!

                Comment


                • #9
                  Hania:
                  It seems that you mixed up cluster-robust standard error with -re- specification.
                  Could you please clarify? Thanks.
                  Kind regards,
                  Carlo
                  (StataNow 18.5)

                  Comment


                  • #10
                    Hi Carlo--
                    Thank you for the note. Essentially I see evidence of heteroskedasticity (so wanted to use cluster robust SEs) but I also wanted to explore if there is any unobserved heterogeneity at state-level using RE.

                    Comment


                    • #11
                      Hania:
                      1) if you detected heteroskedasticity in a cross-sectional study and your regressand is continuous, you shoud go -regress- with -robust- standard errors;
                      2) if you suspect that your cross-sectional study suffers from systematic error autocorrelation (I surmise this is what you mean by unobserved heterogeneity), you should go -regress, vce(cluster state)-.
                      3) if you detected heteroskedasticity in a panel dataset and your regressand is continuous, you shoud go -xtreg- with -robust- or vce(cluster clusterid) standard errors.
                      Kind regards,
                      Carlo
                      (StataNow 18.5)

                      Comment


                      • #12
                        Carlo,
                        That's very clear! Thank you so much.
                        Hania

                        Comment


                        • #13
                          You have almost 3,000 observations and only 50 clusters. I doubt that the clustered standard errors work very well. If you want to allow for heteroskedasticity then just use vce(robust), as Carlo said.

                          Comment


                          • #14
                            Hi Jeff,
                            Thank you for the helpful explanation. Understood. May I ask with a total of ~150,000 observations, how do we determine that 50 clusters is not enough?

                            Comment


                            • #15
                              Originally posted by Jeff Wooldridge View Post
                              Just to clarify: As a general rule, it is permissible to use random effects estimators when you have a cluster structure that is not a panel data structure. In fact, the random effects variance-covariance matrix is a bit more believable because there cannot be serial correlation as in a panel data setting. But if one does this, it is only fair to compare cluster-robust standard errors in both cases. In my experience, using vce(cluster id) for OLS but the nonrobust standard errors for RE leads to just what Hania found: the nonrobust standard errors from RE are smaller. It doesn't have to be that way, but assuming all of the GLS assumptions are true often biases the standard errors downward. So xtreg y x1 ... xK, re vce(cluster id) should be used with panel data or other clustering.

                              Having said that, if you have a large cross section with not so many states -- in the U.S., G = 50 states -- then you shouldn't use random effects. Ideally, you would include state fixed effects, but if you're studying a policy that changes only at the state level then you can't include stage fixed effects. In the end, you should use Carlo's suggestion. But be cautioned that N > 30 might not be enough for clustering to work well if you have large group sizes. How many individuals do you have per state, on average?
                              Dear Prof. Wooldridge, I have a similar question. My data is of 10 years for 300 firms. Hausman test indicated random effects (Re) is appropriate and intuitively also Re makes sense for my variables. However, if I cluster by firm id, I get a lot of significant results as compared to without clustering. How does one justify use of clustering by firm id? What if I had only 3 years of data? Please share your views.

                              Comment

                              Working...
                              X