Suppose I have data on all companies in the entire country (Let's say 1M companies). So it's just nearly an entire population.
I want to run DID that studies the impact of a shock that happened in year 2000 on companies' log sales.
Each company belongs to one of 100 industries.
There is a measure of an exposure to the shock at the industry level. Call it "industry_exposure". What this means is that all companies that belong to the same industry will share the same level of "industry_exposure".
For those 1M companies, I run this
where post takes 1 if year>=2000, and industry_exposureXpost is the DID term of interest, and "id" is the company id.
I have company FE and year FE.
At which level should I cluster?
I read Prof Wooldridge's answer here @Jeff Wooldridge
Appropriate Dimension for Clustering of Standard Errors - Statalist
So there are two questions to ask.
1. Have the data been obtained from cluster sampling?
2. What is the level of assignment of the key explanatory variables?
Regarding #1, it's not cluster sampled. It's almost entire full set of companies in a country. So this question #1 doesn't justify any clustering (right?).
Regarding #2, does the answer depend on whether I take a sample or I use basically the entire population? I take the (almost) entire population of companies.
I want to run DID that studies the impact of a shock that happened in year 2000 on companies' log sales.
Each company belongs to one of 100 industries.
There is a measure of an exposure to the shock at the industry level. Call it "industry_exposure". What this means is that all companies that belong to the same industry will share the same level of "industry_exposure".
For those 1M companies, I run this
Code:
reghdfe lsales industry_exposure post industry_characterXpost , absorb(id year) cluster(???)
I have company FE and year FE.
At which level should I cluster?
I read Prof Wooldridge's answer here @Jeff Wooldridge
Appropriate Dimension for Clustering of Standard Errors - Statalist
So there are two questions to ask.
1. Have the data been obtained from cluster sampling?
2. What is the level of assignment of the key explanatory variables?
Regarding #1, it's not cluster sampled. It's almost entire full set of companies in a country. So this question #1 doesn't justify any clustering (right?).
Regarding #2, does the answer depend on whether I take a sample or I use basically the entire population? I take the (almost) entire population of companies.
Comment