Hi all,
I am attempting to replicate results from a paper to extend its analysis in subsequent research. Currently, I’m running into challenges with my 2SLS regressions and would greatly appreciate your insights. I suspect my code or data set being at fault as i get meaningful results for my OLS regressions
As an example, I am analyzing the effect of climate patents being granted on cumulative abnormal returns (CARs) over the subsequent 18 months. For this, I split the data into terciles based on the Media Coverage of Climate Change (MCCC), represented by three dummy variables: MCCC_H, MCCC_M, and MCCC_L.
When conducting the 2SLS regressions, I am encountering two primary problems:
Data:
The dataset consists of firm-month panel data with approximately 25,000 observations in total. However, each regression excludes the majority of observations and typically uses 7,500–8,000 observations per CAR regression.
Model:
I am using a two-stage least squares (2SLS) regression with extensive fixed effects and firm-level controls. My primary Stata command is ivreghdfe. Here’s an outline of the regression setup:
Dependent variable - CAR_k_w from time t to time t + k
Main independent variable: ln_Num_Pats_Granted_w instrumented using avg_leniency
Firm Controls: market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1
Fixed effects: Industry × Month F.E., Art Unit × Year F.E., and Num_Pat_App
FE Standard Errors: Double-clustered at art-unit and industry-year level (represented by egen cluster_var = group(IND_ID dec_year))
Partialing Out:I am currently partialing out my firm controls - however I have to admit im not to familiar with this type of procedure.
Equation for the seconds stage regression:

Regression Code:
Example Output:
Any insights on potential mistakes in my code or structural issues in the regression setup would be highly appreciated.
Thanks in advance,
Philipp
I am attempting to replicate results from a paper to extend its analysis in subsequent research. Currently, I’m running into challenges with my 2SLS regressions and would greatly appreciate your insights. I suspect my code or data set being at fault as i get meaningful results for my OLS regressions
As an example, I am analyzing the effect of climate patents being granted on cumulative abnormal returns (CARs) over the subsequent 18 months. For this, I split the data into terciles based on the Media Coverage of Climate Change (MCCC), represented by three dummy variables: MCCC_H, MCCC_M, and MCCC_L.
When conducting the 2SLS regressions, I am encountering two primary problems:
- Very high standard errors in some cases
- No results at all due to the covariance matrix not being of full rank (example regression output included below).
Data:
The dataset consists of firm-month panel data with approximately 25,000 observations in total. However, each regression excludes the majority of observations and typically uses 7,500–8,000 observations per CAR regression.
Model:
I am using a two-stage least squares (2SLS) regression with extensive fixed effects and firm-level controls. My primary Stata command is ivreghdfe. Here’s an outline of the regression setup:
Dependent variable - CAR_k_w from time t to time t + k
Main independent variable: ln_Num_Pats_Granted_w instrumented using avg_leniency
Firm Controls: market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1
Fixed effects: Industry × Month F.E., Art Unit × Year F.E., and Num_Pat_App
FE Standard Errors: Double-clustered at art-unit and industry-year level (represented by egen cluster_var = group(IND_ID dec_year))
Partialing Out:I am currently partialing out my firm controls - however I have to admit im not to familiar with this type of procedure.
Equation for the seconds stage regression:
Regression Code:
Code:
* Run regressions
forvalues k = 0/18 {
* IV regression with interactions
ivreghdfe CAR_`k'_w ///
MCCC_H MCCC_M ///
market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 ///
rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1 ///
(c.ln_Num_Pats_Granted_w#i.MCCC_H ///
c.ln_Num_Pats_Granted_w#i.MCCC_M ///
c.ln_Num_Pats_Granted_w#i.MCCC_L = ///
c.avg_leniency#i.MCCC_H ///
c.avg_leniency#i.MCCC_M ///
c.avg_leniency#i.MCCC_L) ///
, absorb(IND_ID#month art_unit_num#dec_year Num_Pat_App) ///
cluster(art_unit_num cluster_var) ///
partial(market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12 roa_lag12 rd_ratio_lag12 past_12m_ret past_12m_sd EnvSc_yearl1)
}
Example Output:
Code:
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on art_unit_num and cluster_var
Number of clusters (art_unit_num) = 378 Number of obs = 7787
Number of clusters (cluster_var) = 204 F( 3, 203) = 0.43
Prob > F = 0.7292
Total (centered) SS = 249.3154017 Centered R2 = -0.0248
Total (uncentered) SS = 249.3154017 Uncentered R2 = -0.0248
Residual SS = 255.4890154 Root MSE = .1821
------------------------------------------------------------------------------------------------
| Robust
CAR_9_w | Coefficient std. err. t P>|t| [95% conf. interval]
-------------------------------+----------------------------------------------------------------
MCCC_H#c.ln_Num_Pats_Granted_w |
0 | .0509095 .1007255 0.51 0.614 -.1476928 .2495118
1 | -.0557951 .1398749 -0.40 0.690 -.3315891 .2199989
|
MCCC_M#c.ln_Num_Pats_Granted_w |
0 | .0669551 .1072587 0.62 0.533 -.144529 .2784391
|
MCCC_H | 0 (omitted)
MCCC_M | 0 (omitted)
------------------------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic): 17.629
Chi-sq(1) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 19.417
(Kleibergen-Paap rk Wald F statistic): 11.216
Stock-Yogo weak ID test critical values: <not available>
------------------------------------------------------------------------------
Warning: estimated covariance matrix of moment conditions not of full rank.
overidentification statistic not reported, and standard errors and
model tests should be interpreted with caution.
Possible causes:
number of clusters insufficient to calculate robust covariance matrix
singleton dummy variable (dummy with one 1 and N-1 0s or vice versa)
partial option may address problem.
------------------------------------------------------------------------------
Collinearities detected among instruments: 2 instrument(s) dropped
Instrumented: 0b.MCCC_H#c.ln_Num_Pats_Granted_w
1.MCCC_H#c.ln_Num_Pats_Granted_w
0b.MCCC_M#c.ln_Num_Pats_Granted_w
Included instruments: MCCC_H MCCC_M
Excluded instruments: 0b.MCCC_H#c.avg_leniency 1.MCCC_H#c.avg_leniency
0b.MCCC_M#c.avg_leniency
Partialled-out: market_cap_y_lag12 tobins_q_lag12 cash_ratio_lag12
roa_lag12 rd_ratio_lag12 past_12m_ret past_12m_sd
EnvSc_yearl1 _cons
nb: total SS, model F and R2s are after partialling-out;
any small-sample adjustments include partialled-out
variables in regressor count K
Dropped collinear: 1.MCCC_M#c.ln_Num_Pats_Granted_w
0b.MCCC_L#c.ln_Num_Pats_Granted_w
1.MCCC_L#c.ln_Num_Pats_Granted_w 1.MCCC_M#c.avg_leniency
0b.MCCC_L#c.avg_leniency 1.MCCC_L#c.avg_leniency
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------------------+---------------------------------------|
IND_ID#month | 1623 1623 0 *|
art_unit_num#dec_year | 1644 1644 0 *|
Num_Pat_App | 70 1 69 |
-----------------------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
(dropped 2395 singleton observations)
Warning - collinearities detected
Vars dropped: 1.MCCC_M#c.ln_Num_Pats_Granted_w
0b.MCCC_L#c.ln_Num_Pats_Granted_w
1.MCCC_L#c.ln_Num_Pats_Granted_w 1.MCCC_M#c.avg_leniency
0b.MCCC_L#c.avg_leniency 1.MCCC_L#c.avg_leniency
(MWFE estimator converged in 32 iterations)
Thanks in advance,
Philipp
