Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • IVREGHDFE with different levels of observations

    Dear colleagues:
    I'm currently working on a project studying the effect of political polarization on firm investments. To address the endogeneity of polarization, I employed a natural hazard instrumental variable. I'm now stuck at picking the correct way to obtain my F-stat for my instrument.

    My first stage data is a 50 states by 20 years panel data, while my second stage data (or my main working data) is a firm by state by year multi dimensional data, and not all 50 states from the first stage are included in the second stage because those firms HQed in those states are not in the sample.
    In my first stage, I regress polarization on natural hazard instrument with state and year fixed effect and cluster the standard error at state level. In my second stage, I regress firm investments on polarization fitted from the first stage and firm and state controls with firm, state, and year fixed effect and cluster the standard error at state by year level.

    My biggest question is that to obtain the correct F-stat of my instrument, should I first match my instrument data to my second stage data and then run
    Code:
    ivreghdfe investment (polarization=natural_hazard), absorb(state year) cluster(state)
    and use the reported KP F-stat?

    or should I run
    Code:
    ivreghdfe polarization (polarization=natural_hazard), a(state year) cl(state) first
    and use the reported KP F-stat from the first stage? I'm using the same endogenous and outcome variable (polarization) in this approach since I'm only interested in the first stage regression results.

    My concern regarding the first approach is that since each state-year observation is matched on multiple firm observations so the regression in this method is run with duplicated observations, and I'm not certain if it inflates or distort my F-stat.
    My concern regarding the second approach is that since not all states are present in my main (second stage) data, is F-stat obtained from regression using all 50 states valid?


    I greatly appreciate any help provided. And please let me know if anything is unclear!

  • #2
    ivreghdfe is from https://github.com/sergiocorreia/ivreghdfe, as you are asked to explain (FAQ Advice #12). In IV2SLS regression, the first stage is not run independent of the second stage. You use the same estimation sample in both stages.

    My concern regarding the first approach is that since each state-year observation is matched on multiple firm observations so the regression in this method is run with duplicated observations, and I'm not certain if it inflates or distort my F-stat.
    This sounds more like an issue of identifying the structure of your data. If you have firm-level panel data and decide to run a pooled analysis, you should ensure that there is no firm-level heterogeneity. However, all of this relates to your model specification, which should be addressed before estimation.

    Comment

    Working...
    X