Which number of cluster is more relevant for a regression analysis (firm or sector)?

Doris Rivera

Join Date: Feb 2020
Posts: 172

Which number of cluster is more relevant for a regression analysis (firm or sector)?

14 Apr 2023, 13:00

Dear Statalist, I am doing a regression for a panel data analysis for more than 2500 firms in 18 manufacturing sectors. My dep variable is at the firm level, as well as the variable "x"; while the rest of indep variable varies for sector-year. I understand that the more cluster the better. But how much is enough for a regression analysis? I mean, I have seen most people say more than 50, while some others say more than 30. Even in this post (https://www.statalist.org/forums/for...tandard-errors) Clyde Schechter comment about 15 being a borderline, which is in consonance with some simulation studies for multilevel analysis.

However, I would like to be sure and understand what I am doing, and I would like to ask you if the number of within cluster cases (here would be firms) also affect. I mean, if despite the low number of clusters, having a decent number (more than 50) of cases (that is, firms) per cluster is better.

Also, I have read that a possible solution to the small number of clusters could be to bootstrap the errors. I am doing the following model (see below) using the reghdfe command. However, after looking in its help section, I thing it does not support bootstrap errors.

Do you think that 18 clusters is enough?
Is there a way to implement a bootstrap error option for all the betas when using this command (even after the estimation)?
Do you know of any other possible solution to the small number of cluster problem?

Thanks in advance for your help!

Code:

HDFE Linear regression                            Number of obs   =     25,790
Absorbing 2 HDFE groups                           F(   9,     17) =      15.53
Statistics robust to heteroskedasticity           Prob > F        =     0.0000
                                                  R-squared       =     0.0773
                                                  Adj R-squared   =    -0.0403
                                                  Within R-sq.    =     0.0011
Number of clusters (sectors) =         18         Root MSE        =     0.3672

                                 (Std. Err. adjusted for 18 clusters in sectors)
--------------------------------------------------------------------------------
               |               Robust
             y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
             x |
           L1. |  -.1010899   .0786277    -1.29   0.216    -.2669798       .0648
               |
        intra1 |
           L1. |  -.0139542   .0195475    -0.71   0.485    -.0551958    .0272874
               |
        intra2 |
           L1. |   .0012664   .0041558     0.30   0.764    -.0075015    .0100344
               |
        inter1 |
           L1. |    .025548   .0229927     1.11   0.282    -.0229625    .0740585
               |
        inter2 |
           L1. |   .0643031   .0404676     1.59   0.130    -.0210762    .1496823
               |
cL.x#cL.intra1 |   .0204768   .0834788     0.25   0.809    -.1556481    .1966017
               |
cL.x#cL.intra2 |  -.2086073   .0585233    -3.56   0.002    -.3320806   -.0851339
               |
cL.x#cL.inter1 |  -.0214449   .0684947    -0.31   0.758    -.1659561    .1230663
               |
cL.x#cL.inter2 |   .0259349   .1128773     0.23   0.821    -.2122154    .2640853
               |
         _cons |   .0331025    .021017     1.58   0.134    -.0112395    .0774446
--------------------------------------------------------------------------------

Tags: None

George Ford

Join Date: Aug 2014

Posts: 3148
#2

20 Apr 2023, 13:34

ssc install boottest

but you'll need to use xtreg and include one of your FE as i.x.

Bootstrapped SE are rarely smaller, and almost nothing is stat sig here.

18 is probably enough. but you could cluster on firms.
1 like
Comment
Doris Rivera

Join Date: Feb 2020

Posts: 172
#3

24 Apr 2023, 01:22

Dear George Ford , thanks for your answer! IT perfectly worked, even though as you said, SE are much higher. Can I ask you for a reference about using around 18 clusters? Most of the reference I have seen talk about 30 or more, the only ones talking around 20 comes from multilevel literature, and I do not know if it is the same than here (FEs).
About your suggestion clustering at the firm level, I am not so sure if I should, since almost all indep variables are at the sectoral-year level (even though the dep variable is at firm-year level).

Anyway, thanks for your help!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#4

24 Apr 2023, 05:38

Doris:
I do not think that 18 clusters are enough to avoid the risk that they may be more misleading than their default counterparts.
I do share George's wise recomendation to cluster on firms, that I assume to be your -panelvar-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Doris Rivera

Join Date: Feb 2020

Posts: 172
#5

24 Apr 2023, 08:32

Dear Carlo Lazzaro , thanks for the advice. Yes, you are right, firms IDs are the panelvar. And yes, I can certaintly cluster the errors at firm level. But I am wondering if given that the main indep variables are at the sector level (sector-year), I am in some way forced to acknowledge the very likely correlation within sectors (several firms belonging to the same sector). I understand that given that I am using firm's FE I would be controlling by anything that is sector specific (given that firms do not change sectors) as well as firm specific, and thus, it is not needed to cluster the errors at firm level (since firms' FEs control by the unobserved heterogeneity at firm level).

I also have read in some posts of Clyde Schechter that I could do the regression without the cluster errors and check if there is significant changes. If I understood well, the idea is that if there is not much differences, then I am good clustering the errors. But I do not know how to approach what could be a significant change here. So, how much difference in the SE is enough to do (or not do) clustered errors?

So, having clustered errors with only 18 groups could end being more dangerous than not clustering the errors? This is why you advice to cluster at firm level, right?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17707
#6

24 Apr 2023, 12:10

Doris:
starting from the bottom of your reply:
1) yes, that was the meaning of my advice;
2) Clyde's advice is obviously correct. Unfortunately, only experience with similar data can give you a sense of the relevance of the difference between default and cluster robust standard errors, so that youcan decife which approach to follow;
3) as far as your first point is concerned, I would not create a -sector-year- predictor, but simply plug -i.year- in the right-hand side of my regression equation. In fact, as it is unlikely that firms change industry as time goes by, industry is a time-invariant predictor to be wiped out by the -fe- machinery.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment

George Ford

Join Date: Aug 2014
Posts: 3148

24 Apr 2023, 17:45

HTML Code:

https://declaredesign.org/blog/how-misleading-are-clustered-ses-in-designs-with-few-clusters.html
https://link.springer.com/article/10.3758/s13428-021-01627-0

I sometimes do this type of analysis on few cluster data to check how things work.

Comment

George Ford

Join Date: Aug 2014

Posts: 3148
#8

24 Apr 2023, 17:46

HTML Code:

https://declaredesign.org/blog/how-misleading-are-clustered-ses-in-designs-with-few-clusters.html https://link.springer.com/article/10.3758/s13428-021-01627-0

I sometimes do this type of analysis on few cluster data to check how things work.
Comment
Doris Rivera

Join Date: Feb 2020

Posts: 172
#9

25 Apr 2023, 02:30

Thanks both for the suggestions. It would be really nice to know how to replicate something like this simulation with the structure of my data. But I think this is beyod my Stata basic level. But it is a really good idea!
Comment
George Ford

Join Date: Aug 2014

Posts: 3148
#10

25 Apr 2023, 11:03

It's not that difficult. Shoot me a private message and I'll share some code.
Comment

Announcement

Which number of cluster is more relevant for a regression analysis (firm or sector)?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment