Dear colleagues:
I'm currently working on a project studying the effect of political polarization on firm investments. To address the endogeneity of polarization, I employed a natural hazard instrumental variable. I'm now stuck at picking the correct way to obtain my F-stat for my instrument.
My first stage data is a 50 states by 20 years panel data, while my second stage data (or my main working data) is a firm by state by year multi dimensional data, and not all 50 states from the first stage are included in the second stage because those firms HQed in those states are not in the sample.
In my first stage, I regress polarization on natural hazard instrument with state and year fixed effect and cluster the standard error at state level. In my second stage, I regress firm investments on polarization fitted from the first stage and firm and state controls with firm, state, and year fixed effect and cluster the standard error at state by year level.
My biggest question is that to obtain the correct F-stat of my instrument, should I first match my instrument data to my second stage data and then run
and use the reported KP F-stat?
or should I run
and use the reported KP F-stat from the first stage? I'm using the same endogenous and outcome variable (polarization) in this approach since I'm only interested in the first stage regression results.
My concern regarding the first approach is that since each state-year observation is matched on multiple firm observations so the regression in this method is run with duplicated observations, and I'm not certain if it inflates or distort my F-stat.
My concern regarding the second approach is that since not all states are present in my main (second stage) data, is F-stat obtained from regression using all 50 states valid?
I greatly appreciate any help provided. And please let me know if anything is unclear!
I'm currently working on a project studying the effect of political polarization on firm investments. To address the endogeneity of polarization, I employed a natural hazard instrumental variable. I'm now stuck at picking the correct way to obtain my F-stat for my instrument.
My first stage data is a 50 states by 20 years panel data, while my second stage data (or my main working data) is a firm by state by year multi dimensional data, and not all 50 states from the first stage are included in the second stage because those firms HQed in those states are not in the sample.
In my first stage, I regress polarization on natural hazard instrument with state and year fixed effect and cluster the standard error at state level. In my second stage, I regress firm investments on polarization fitted from the first stage and firm and state controls with firm, state, and year fixed effect and cluster the standard error at state by year level.
My biggest question is that to obtain the correct F-stat of my instrument, should I first match my instrument data to my second stage data and then run
Code:
ivreghdfe investment (polarization=natural_hazard), absorb(state year) cluster(state)
or should I run
Code:
ivreghdfe polarization (polarization=natural_hazard), a(state year) cl(state) first
My concern regarding the first approach is that since each state-year observation is matched on multiple firm observations so the regression in this method is run with duplicated observations, and I'm not certain if it inflates or distort my F-stat.
My concern regarding the second approach is that since not all states are present in my main (second stage) data, is F-stat obtained from regression using all 50 states valid?
I greatly appreciate any help provided. And please let me know if anything is unclear!
Comment