PPMLHDFE - omitted variables

Pericles Sa Nogueira

Join Date: Aug 2024

Posts: 6
#1

PPMLHDFE - omitted variables

30 Sep 2024, 22:17

Hello community,

I am using the ppmlhdfe regression to study the impact of common language (comlang_off) on trade (tradeflow_baci) between many African countries.

For that I am using exporter fixed effects (exp_year), importer fixed effects (imp_year) and country pair fixed effects (pair_id).

1 - Regression: pmlhdfe tradeflow_baci fta_wto ln_dist contig comlang_off, a(exp_year imp_year pair_id) cluster (pair_id) nolog

Output with country pair_id fixed effects:

(dropped 283 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (5.1476e-09)
note: 1 variable omitted because of collinearity: comlang_off
note: ln_dist is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
Converged in 13 iterations and 69 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 9,051
Absorbing 3 HDFE groups Residual df = 2,047
Statistics robust to heteroskedasticity Wald chi2(2) = 1.17
Deviance = 52472688.32 Prob > chi2 = 0.5574
Log pseudolikelihood = -26271738.18 Pseudo R2 = 0.9676

Number of clusters (pair_id)= 2,048
(Std. err. adjusted for 2,048 clusters in pair_id)
------------------------------------------------------------------------------
| Robust
tradeflow_~i | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
fta_wto | -.2064625 .2064227 -1.00 0.317 -.6110436 .1981186
ln_dist | 0 (omitted)
contig | .0691352 .1640386 0.42 0.673 -.2523745 .3906449
comlang_off | 0 (omitted)
_cons | 13.10739 .1680008 78.02 0.000 12.77812 13.43667
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
exp_year | 306 1 305 |
imp_year | 308 6 302 |
pair_id | 2048 2048 0 *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
Dear community

2 - Regression without country pair_id: ppmlhdfe tradeflow_baci fta_wto ln_dist contig comlang_off, a(exp_year imp_year) cluster (pair_id) nolog

Output without country pair_id fixed effects:

(dropped 2 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (5.2246e-09)
Converged in 11 iterations and 50 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 9,332
Absorbing 2 HDFE groups Residual df = 2,328
Statistics robust to heteroskedasticity Wald chi2(4) = 732.50
Deviance = 248048316.2 Prob > chi2 = 0.0000
Log pseudolikelihood = -124060124.6 Pseudo R2 = 0.8487

Number of clusters (pair_id)= 2,329
(Std. err. adjusted for 2,329 clusters in pair_id)
------------------------------------------------------------------------------
| Robust
tradeflow_~i | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
fta_wto | .7144648 .1703926 4.19 0.000 .3805013 1.048428
ln_dist | -.8836056 .1328718 -6.65 0.000 -1.14403 -.6231816
contig | .773755 .1805465 4.29 0.000 .4198903 1.12762
comlang_off | .670514 .1545234 4.34 0.000 .3676537 .9733744
_cons | 17.56397 1.064159 16.51 0.000 15.47825 19.64968
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
exp_year | 306 0 306 |
imp_year | 308 6 302 |

Question: in the regression 1(with country pair_id) variable comlang_off (dummy = 1 if same language) is being omitted due to collinearity. I am not sure why. It looks like the country pair fixed effects is absorbing the comlang_off fixed effects. If so, can I go for regression 2 without country pair (pair_id) fixed effects without any concern of endogeneity?

Thoughts and comments are welcome.
Thank you in advance
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#2

01 Oct 2024, 06:53

Dear Pericles Sa Nogueira,

Indeed, in model 1 the pair fixed effects will absorb all characteristics of the pair that do not vary over time. So, if there are no changes to the languages, that variable (like distance) will drop out. In the second regression, the effect of FTAs is likely to be inflated by endogeneity, and this will affect the other estimates. Therefore, I suggest estimating model 2 without the FTA variable to see if the coefficient of interest changes much. If not, then you should be OK.

Um abraço,

Joao
Comment
Pericles Sa Nogueira

Join Date: Aug 2024

Posts: 6
#3

01 Oct 2024, 22:49

Dear Professor Joao Santos Silva ,
Muito Obrigado.

You reply is much appreciated.

I've run the the model 2 without the FTA variable and the coefficient for common language didn't change much (before = .670514 vs now= .6927803). I believe, there is no major concern of endogeneity.

I am sending below the result of the regression using model 2:

. ppmlhdfe tradeflow_baci ln_dist contig comlang_off, a(exp_year imp_year) cluster (pair_id) nolog
(dropped 2 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (5.2246e-09)
Converged in 11 iterations and 49 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 9,332
Absorbing 2 HDFE groups Residual df = 2,328
Statistics robust to heteroskedasticity Wald chi2(3) = 683.35
Deviance = 258104647.5 Prob > chi2 = 0.0000
Log pseudolikelihood = -129088290.3 Pseudo R2 = 0.8426

Number of clusters (pair_id)= 2,329
(Std. err. adjusted for 2,329 clusters in pair_id)
------------------------------------------------------------------------------
| Robust
tradeflow_~i | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ln_dist | -1.211516 .1389983 -8.72 0.000 -1.483948 -.9390842
contig | .6477741 .2003857 3.23 0.001 .2550253 1.040523
comlang_off | .6927803 .157021 4.41 0.000 .3850249 1.000536
_cons | 20.42258 1.125855 18.14 0.000 18.21594 22.62921

Thank you
Péricles
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#4

02 Oct 2024, 00:59

Dear Pericles Sa Nogueira,

That is reassuring. Note, however, that the other coefficients change a lot, so do not read too much into them.

Best wishes,

Joao
Comment
Tariq Masood

Join Date: Jan 2023

Posts: 27
#5

02 Oct 2024, 04:42

In addition to what prof Silva suggested, I think while retaining FTA and pair fixed effects, you can go for the second-stage estimation. After predicting the pair-fixed effects in the first stage, we can try to establish whether language is related to it.

First Stage Regression:

X_(ij,t) = exp(μ_(i,t) + π_(j,t) + δ_ij + β_z * Z_(ij,t) + ϵ_(ij,t))

Where:
- X_(ij,t) is the trade flow between exporter i and importer j at time t
- μ_(i,t) represents exporter-time fixed effects
- π_(j,t) represents importer-time fixed effects
- δ_ij captures country-pair fixed effects
- Z_(ij,t) is a vector of trade policy variables (such as FTA or WTO membership)

Second Stage Regression:

δ̂_ij = exp(ν_i + ζ_j + β_"Lang" * Lang_ij + β_k * K_ij + ϵ_ij)

Where:
- δ̂_ij is the estimated pair fixed effect
- ν_i represents exporter-specific fixed effects
- ζ_j represents importer-specific fixed effects
- Lang_ij represents the language variable, which could measure linguistic proximity or a shared language between countries i and j
- K_ij represents other time-invariant factors influencing trade between countries i and j

Code:

use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear egen imp = group(isoimp) egen exp = group(isoexp) ppmlhdfe trade fta, a(imp#year exp#year imp#exp, save) cluster(imp#exp) rename __hdfe3__ pair_fixed reghdfe pair_fixed comlang_off, a(imp#year exp#year) cluster(imp#exp)
Comment
Pericles Sa Nogueira

Join Date: Aug 2024

Posts: 6
#6

02 Oct 2024, 20:49

Tariq Masood thank you for your very useful suggestion.

Just to make sure I understood your point: would you mean that by running the code below I will be able to determine if the pair fixed effects (__hdfe3__) generated by the system is absorbing the common_language effect?

If my understanding is correct in the example you shared common language appears to be significant. Thus it is related to pair-fixed effects. Right?

Code:

use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear egen imp = group(isoimp) egen exp = group(isoexp) ppmlhdfe trade fta, a(imp#year exp#year imp#exp, save) cluster(imp#exp) rename __hdfe3__ pair_fixed reghdfe pair_fixed comlang_off, a(imp#year exp#year) cluster(imp#exp)

Output:

(MWFE estimator converged in 3 iterations)

HDFE Linear regression Number of obs = 5,950
Absorbing 2 HDFE groups F( 1, 1189) = 59.20
Statistics robust to heteroskedasticity Prob > F = 0.0000
R-squared = 0.4922
Adj R-squared = 0.4610
Within R-sq. = 0.0447
Number of clusters (imp#exp) = 1,190 Root MSE = 0.8714

(Std. err. adjusted for 1,190 clusters in imp#exp)
------------------------------------------------------------------------------
| Robust
pair_fixed | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
comlang_off | .7275164 .0945506 7.69 0.000 .5420119 .913021
_cons | -1.059597 .0267121 -39.67 0.000 -1.112005 -1.007189

Thanks
Comment
Tariq Masood

Join Date: Jan 2023

Posts: 27
#7

03 Oct 2024, 11:22

This is one of many strategies used by researchers to get a sense of absorbed variables. Predicted pair fixed effects are part of the dependent variable, so in the second stage, you are indirectly testing where language variables determine trade flows. As pair-fixed effects included are always time-invariant, one approach could be to use some time-varying measure (like linguistic similarity), if available, that will not be absorbed by the pair-fixed effects.
Comment

Announcement

PPMLHDFE - omitted variables

Comment

Comment

Comment

Comment

Comment

Comment