Dear Statalist, I am doing a regression for a panel data analysis for more than 2500 firms in 18 manufacturing sectors. My dep variable is at the firm level, as well as the variable "x"; while the rest of indep variable varies for sector-year. I understand that the more cluster the better. But how much is enough for a regression analysis? I mean, I have seen most people say more than 50, while some others say more than 30. Even in this post (https://www.statalist.org/forums/for...tandard-errors) Clyde Schechter comment about 15 being a borderline, which is in consonance with some simulation studies for multilevel analysis.
However, I would like to be sure and understand what I am doing, and I would like to ask you if the number of within cluster cases (here would be firms) also affect. I mean, if despite the low number of clusters, having a decent number (more than 50) of cases (that is, firms) per cluster is better.
Also, I have read that a possible solution to the small number of clusters could be to bootstrap the errors. I am doing the following model (see below) using the reghdfe command. However, after looking in its help section, I thing it does not support bootstrap errors.
Do you think that 18 clusters is enough?
Is there a way to implement a bootstrap error option for all the betas when using this command (even after the estimation)?
Do you know of any other possible solution to the small number of cluster problem?
Thanks in advance for your help!
However, I would like to be sure and understand what I am doing, and I would like to ask you if the number of within cluster cases (here would be firms) also affect. I mean, if despite the low number of clusters, having a decent number (more than 50) of cases (that is, firms) per cluster is better.
Also, I have read that a possible solution to the small number of clusters could be to bootstrap the errors. I am doing the following model (see below) using the reghdfe command. However, after looking in its help section, I thing it does not support bootstrap errors.
Do you think that 18 clusters is enough?
Is there a way to implement a bootstrap error option for all the betas when using this command (even after the estimation)?
Do you know of any other possible solution to the small number of cluster problem?
Thanks in advance for your help!
Code:
HDFE Linear regression Number of obs = 25,790 Absorbing 2 HDFE groups F( 9, 17) = 15.53 Statistics robust to heteroskedasticity Prob > F = 0.0000 R-squared = 0.0773 Adj R-squared = -0.0403 Within R-sq. = 0.0011 Number of clusters (sectors) = 18 Root MSE = 0.3672 (Std. Err. adjusted for 18 clusters in sectors) -------------------------------------------------------------------------------- | Robust y | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------+---------------------------------------------------------------- x | L1. | -.1010899 .0786277 -1.29 0.216 -.2669798 .0648 | intra1 | L1. | -.0139542 .0195475 -0.71 0.485 -.0551958 .0272874 | intra2 | L1. | .0012664 .0041558 0.30 0.764 -.0075015 .0100344 | inter1 | L1. | .025548 .0229927 1.11 0.282 -.0229625 .0740585 | inter2 | L1. | .0643031 .0404676 1.59 0.130 -.0210762 .1496823 | cL.x#cL.intra1 | .0204768 .0834788 0.25 0.809 -.1556481 .1966017 | cL.x#cL.intra2 | -.2086073 .0585233 -3.56 0.002 -.3320806 -.0851339 | cL.x#cL.inter1 | -.0214449 .0684947 -0.31 0.758 -.1659561 .1230663 | cL.x#cL.inter2 | .0259349 .1128773 0.23 0.821 -.2122154 .2640853 | _cons | .0331025 .021017 1.58 0.134 -.0112395 .0774446 --------------------------------------------------------------------------------
Comment