Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do double clustering of standard errors and simultaneously control for firm fixed effects?

    Dear all

    I am running an OLS regression with a panel dataset. There is an observation for each firm-calendar month. I want to cluster the standard errors by both firm and month level. I also want to control for firm fixed effects simultaneously. I am aware of cluster2 and cgmreg commands in Stata to do double clustering, but I haven't found a way to control for firm fixed effect using these two commands. As I have 15,000+ firms, if I use 'xi:cluster2 y x i.firmid', Stata will generate two many firm dummies and the running time is very long. Stata ends up reporting the following error. I was wondering if there is an efficient way to do this, like 'areg, absorb' which does not generate and report those firm dummies. Any help would be much appreciated!

    "characteristic contents too long
    The maximum value of the contents is 67,784.
    characteristic contents too long
    The maximum value of the contents is 67,784.
    characteristic contents too long
    The maximum value of the contents is 67,784.
    matsize too small
    You have attempted to create a matrix with too many rows or columns or
    attempted to fit a model with too many variables. You need to increase
    matsize; it is currently 400. Use set matsize; see help matsize.

    If you are using factor variables and included an interaction that has
    lots of missing cells, either increase matsize or set emptycells drop
    to reduce the required matrix size; see help set emptycells.

    If you are using factor variables, you might have accidentally treated
    a continuous variable as a categorical, resulting in lots of
    categories. Use the c. operator on such variables."

    Best regards
    Shuang

  • #2
    You can use user-written -xtivreg2- which does two-way clustering with firm fixed-effects. Even though this command is designed to do panel IV estimation, you can trick it to do normal panel model like this:

    Code:
    use http://fmwww.bc.edu/ec-p/data/macro/abdata.dta
    tsset id year
    xtivreg2 ys k, fe cluster(id year)
    Also, there's no need to use xi prefix any longer, unless you have a very old version of Stata or are using an older user-written command that cannot handle factor variable notation.
    Last edited by Dimitriy V. Masterov; 20 Mar 2018, 19:54.

    Comment


    • #3
      Dear Dimitriy

      Thank you very much for your help. This method works. The only problem is the adj R-squared reported by xtivreg2 does not look correct. For my data it is -0.1%. The adj R-sq reported by 'areg, absorb' clustering only by firm level is 30.9%. Because the models estimated by these two methods have the same coefficients, they should have the same adj R-sq. Is there any way to do what I want and also report the correct adj R-sq? Thank you very much!

      Best,
      Shuang

      Comment


      • #4
        As far as I understand this, this is not a great comparison for the following reason. If you include your fixed effects in calculating the model fit, you will usually get a very high R^2 whether or not the model is very good. -areg- and xtreg/xtivreg2 handle this differently because they have different asymptotics, even though people will often interpret the fact that they yield the same coefficients as them being the same command.

        -areg- treats the FEs as parameters to be estimated because it is designed for situations where the number FEs does not increase with the sample. Think of people living in states, where you absorb the state FEs. As N grows (adding more people to your data), the number of states stays fixed. The OLS estimator with id dummies will give a similar high R2, though the FEs will be reported since they are coefficients.

        -xtreg, fe- treats the FEs as nuisance parameters that don't really count as part of the model because here the number of FEs does grow with the sample since each time you add a person you are adding a FE, and you're not really estimating them, but just eliminating them by demeaning the data. The R-squared reported by xtivreg2 for the fixed-effects estimation is the "within R-squared" obtained by estimating the equation in mean-deviation form. When most of the explanatory power comes from the FE that you are not really estimating, the high R2 is arguably illusory.

        The differences in approach are reflected in the R^2s reported by the two commands, and the comparison between them is problematic.

        Comment


        • #5
          Dear Dimitriy

          Thanks again for your reply. Could you please kindly explain what the centered and uncentered R2 reported by xtivreg2 are? For my data, they are 1.27%, which is different from -0.1% output by outreg2 command to Excel after estimation with xtivreg2. And the adj R2 reported by areg command is 31%.

          Comment


          • #6
            Take a look at the Assessing goodness of fit in the pdf manual for -xtreg- for the formulas. Also, the SJ paper is good.

            I don't really use -outreg2- or Excel, so I cannot help on that front.

            -areg- is going to have much higher R^2 for reasons that I tried to explain above.

            Comment

            Working...
            X