Hi all,
I am attempting to replicate results from a paper to extend its analysis in subsequent research. Currently, I’m running into challenges with my 2SLS regressions and would greatly appreciate your insights. I suspect my code or data set being at fault as i get meaningful results for my OLS regressions
As an example, I am analyzing the effect of climate patents being granted on cumulative abnormal returns (CARs) over the subsequent 18 months. For this, I split the data into terciles based on the Media Coverage of Climate Change (MCCC), represented by three dummy variables: MCCC_H, MCCC_M, and MCCC_L.
When conducting the 2SLS regressions, I am encountering two primary problems:
Data:
The dataset consists of firm-month panel data with approximately 25,000 observations in total. However, each regression excludes the majority of observations and typically uses 7,500–8,000 observations per CAR regression.
Model:
I am using a two-stage least squares (2SLS) regression with extensive fixed effects and firm-level controls. My primary Stata command is ivreghdfe. Here’s an outline of the regression setup:
Dependent variable - CAR_k_w from time t to time t + k
Main independent variable: ln_Num_Pats_Granted_w instrumented using avg_leniency
Firm Controls: market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1
Fixed effects: Industry × Month F.E., Art Unit × Year F.E., and Num_Pat_App
FE Standard Errors: Double-clustered at art-unit and industry-year level (represented by egen cluster_var = group(IND_ID dec_year))
Partialing Out:I am currently partialing out my firm controls - however I have to admit im not to familiar with this type of procedure.
Equation for the seconds stage regression:
Regression Code:
Example Output:
Any insights on potential mistakes in my code or structural issues in the regression setup would be highly appreciated.
Thanks in advance,
Philipp
I am attempting to replicate results from a paper to extend its analysis in subsequent research. Currently, I’m running into challenges with my 2SLS regressions and would greatly appreciate your insights. I suspect my code or data set being at fault as i get meaningful results for my OLS regressions
As an example, I am analyzing the effect of climate patents being granted on cumulative abnormal returns (CARs) over the subsequent 18 months. For this, I split the data into terciles based on the Media Coverage of Climate Change (MCCC), represented by three dummy variables: MCCC_H, MCCC_M, and MCCC_L.
When conducting the 2SLS regressions, I am encountering two primary problems:
- Very high standard errors in some cases
- No results at all due to the covariance matrix not being of full rank (example regression output included below).
Data:
The dataset consists of firm-month panel data with approximately 25,000 observations in total. However, each regression excludes the majority of observations and typically uses 7,500–8,000 observations per CAR regression.
Model:
I am using a two-stage least squares (2SLS) regression with extensive fixed effects and firm-level controls. My primary Stata command is ivreghdfe. Here’s an outline of the regression setup:
Dependent variable - CAR_k_w from time t to time t + k
Main independent variable: ln_Num_Pats_Granted_w instrumented using avg_leniency
Firm Controls: market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1
Fixed effects: Industry × Month F.E., Art Unit × Year F.E., and Num_Pat_App
FE Standard Errors: Double-clustered at art-unit and industry-year level (represented by egen cluster_var = group(IND_ID dec_year))
Partialing Out:I am currently partialing out my firm controls - however I have to admit im not to familiar with this type of procedure.
Equation for the seconds stage regression:
Regression Code:
Code:
* Run regressions forvalues k = 0/18 { * IV regression with interactions ivreghdfe CAR_`k'_w /// MCCC_H MCCC_M /// market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 /// rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1 /// (c.ln_Num_Pats_Granted_w#i.MCCC_H /// c.ln_Num_Pats_Granted_w#i.MCCC_M /// c.ln_Num_Pats_Granted_w#i.MCCC_L = /// c.avg_leniency#i.MCCC_H /// c.avg_leniency#i.MCCC_M /// c.avg_leniency#i.MCCC_L) /// , absorb(IND_ID#month art_unit_num#dec_year Num_Pat_App) /// cluster(art_unit_num cluster_var) /// partial(market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1) }
Example Output:
Code:
IV (2SLS) estimation -------------------- Estimates efficient for homoskedasticity only Statistics robust to heteroskedasticity and clustering on art_unit_num and cluster_var Number of clusters (art_unit_num) = 378 Number of obs = 7787 Number of clusters (cluster_var) = 204 F( 3, 203) = 0.43 Prob > F = 0.7292 Total (centered) SS = 249.3154017 Centered R2 = -0.0248 Total (uncentered) SS = 249.3154017 Uncentered R2 = -0.0248 Residual SS = 255.4890154 Root MSE = .1821 ------------------------------------------------------------------------------------------------ | Robust CAR_9_w | Coefficient std. err. t P>|t| [95% conf. interval] -------------------------------+---------------------------------------------------------------- MCCC_H#c.ln_Num_Pats_Granted_w | 0 | .0509095 .1007255 0.51 0.614 -.1476928 .2495118 1 | -.0557951 .1398749 -0.40 0.690 -.3315891 .2199989 | MCCC_M#c.ln_Num_Pats_Granted_w | 0 | .0669551 .1072587 0.62 0.533 -.144529 .2784391 | MCCC_H | 0 (omitted) MCCC_M | 0 (omitted) ------------------------------------------------------------------------------------------------ Underidentification test (Kleibergen-Paap rk LM statistic): 17.629 Chi-sq(1) P-val = 0.0000 ------------------------------------------------------------------------------ Weak identification test (Cragg-Donald Wald F statistic): 19.417 (Kleibergen-Paap rk Wald F statistic): 11.216 Stock-Yogo weak ID test critical values: <not available> ------------------------------------------------------------------------------ Warning: estimated covariance matrix of moment conditions not of full rank. overidentification statistic not reported, and standard errors and model tests should be interpreted with caution. Possible causes: number of clusters insufficient to calculate robust covariance matrix singleton dummy variable (dummy with one 1 and N-1 0s or vice versa) partial option may address problem. ------------------------------------------------------------------------------ Collinearities detected among instruments: 2 instrument(s) dropped Instrumented: 0b.MCCC_H#c.ln_Num_Pats_Granted_w 1.MCCC_H#c.ln_Num_Pats_Granted_w 0b.MCCC_M#c.ln_Num_Pats_Granted_w Included instruments: MCCC_H MCCC_M Excluded instruments: 0b.MCCC_H#c.avg_leniency 1.MCCC_H#c.avg_leniency 0b.MCCC_M#c.avg_leniency Partialled-out: market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1 _cons nb: total SS, model F and R2s are after partialling-out; any small-sample adjustments include partialled-out variables in regressor count K Dropped collinear: 1.MCCC_M#c.ln_Num_Pats_Granted_w 0b.MCCC_L#c.ln_Num_Pats_Granted_w 1.MCCC_L#c.ln_Num_Pats_Granted_w 1.MCCC_M#c.avg_leniency 0b.MCCC_L#c.avg_leniency 1.MCCC_L#c.avg_leniency ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------------------+---------------------------------------| IND_ID#month | 1623 1623 0 *| art_unit_num#dec_year | 1644 1644 0 *| Num_Pat_App | 70 1 69 | -----------------------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation (dropped 2395 singleton observations) Warning - collinearities detected Vars dropped: 1.MCCC_M#c.ln_Num_Pats_Granted_w 0b.MCCC_L#c.ln_Num_Pats_Granted_w 1.MCCC_L#c.ln_Num_Pats_Granted_w 1.MCCC_M#c.avg_leniency 0b.MCCC_L#c.avg_leniency 1.MCCC_L#c.avg_leniency (MWFE estimator converged in 32 iterations)
Thanks in advance,
Philipp