Questions re: PPML for research on trade

James Baloto

Join Date: Jan 2024

Posts: 7
#1

Questions re: PPML for research on trade

26 May 2024, 07:27

Good Day,

I am not well-versed in using Stata and am a newcomer to econometrics. Currently, I am doing a research examining the impact of institutional quality on the export of wood products for 21 years. Below, I've outlined the key variables employed in my study:
trade_Musd: Trade value in million USD.

ln_distancebv: Natural logarithm of the distance between trade partners.

ln_gdp15_Obv: Natural logarithm of the constant 2015 GDP of the reporting country.

ln_gdp15_Dbv: Natural logarithm of the constant 2015 GDP of the partner country.

contigbv: Indicator for whether countries share a border (contiguity).

comlang_offbv: Indicator for whether countries share an official language.

gee_reporter_5, rqe_reporter_5, rle_reporter_5: Rescaled indicators of institutional quality (from a scale of -2.5 – 2.5 to 0 – 5).

Additionally, variables ending with “bv” have been adjusted using the Bonus-Vetus Method proposed by Baier and Bergstrand to address Multilateral Trade Resistance (MTR).

I am utilizing the PPML method to account for zero trade flows in the data, which are notably present in 132 out of 3044 observations. I have structured my model as follows:

.ppmlhdfe trade_Musd YR* Imp_FE* ln_distancebv ln_gdp15_Obv ln_gdp15_Dbv contigbv comlang_offbv gee_reporter_5 rqe_reporter_5 rle_reporter_5, cluster (country_pair)

However, I have encountered warnings and issues during modeling:
Warning: The dependent variable takes very low values after standardizing (4.7427e-07).

Note: Variables YR21 and Imp_FE30 were omitted due to collinearity.

This is my code and the result of my code in stata:

ppmlhdfe trade_Musd YR* Imp_FE* ln_distancebv ln_gdp15_Obv ln_gdp15_Dbv contigbv comlang_offbv gee_reporter_5 rqe_reporter_5 rle_reporter_5 , cluster ( country_pair)
warning: dependent variable takes very low values after standardizing (4.7427e-07)
note: 2 variables omitted because of collinearity: YR21 Imp_FE30
Iteration 1: deviance = 1.2667e+05 eps = . iters = 1 tol = 1.0e-04 min(eta) = -4.26 P
Iteration 2: deviance = 9.6560e+04 eps = 3.12e-01 iters = 1 tol = 1.0e-04 min(eta) = -5.47
Iteration 3: deviance = 9.4215e+04 eps = 2.49e-02 iters = 1 tol = 1.0e-04 min(eta) = -6.10
Iteration 4: deviance = 9.4178e+04 eps = 3.91e-04 iters = 1 tol = 1.0e-04 min(eta) = -6.20
Iteration 5: deviance = 9.4178e+04 eps = 4.23e-07 iters = 1 tol = 1.0e-04 min(eta) = -6.20
Iteration 6: deviance = 9.4178e+04 eps = 4.26e-12 iters = 1 tol = 1.0e-05 min(eta) = -6.20 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 6 iterations and 6 HDFE sub-iterations (tol = 1.0e-08)

PPML regression No. of obs = 2,012
Residual df = 144
Statistics robust to heteroskedasticity Wald chi2(57) = 6317.31
Deviance = 94177.76498 Prob > chi2 = 0.0000
Log pseudolikelihood = -50876.06515 Pseudo R2 = 0.7598

Number of clusters (country_pair)= 145
(Std. err. adjusted for 145 clusters in country_pair)
--------------------------------------------------------------------------------
| Robust
trade_Musd | Coefficient std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
YR1 | .5201256 .4521295 1.15 0.250 -.366032 1.406283
YR2 | .5693542 .425712 1.34 0.181 -.265026 1.403735
YR3 | .0812014 .3480763 0.23 0.816 -.6010157 .7634185
YR4 | -.0667056 .3440161 -0.19 0.846 -.7409648 .6075537
YR5 | -.2977823 .3044973 -0.98 0.328 -.894586 .2990213
YR6 | -.453788 .3123138 -1.45 0.146 -1.065912 .1583358
YR7 | -.3775395 .2578755 -1.46 0.143 -.8829662 .1278871
YR8 | -.27286 .258202 -1.06 0.291 -.7789266 .2332065
YR9 | -.4517603 .3093984 -1.46 0.144 -1.05817 .1546495
YR10 | -.5358888 .2348802 -2.28 0.023 -.9962455 -.0755321
YR11 | -.4674764 .2593535 -1.80 0.071 -.9758 .0408472
YR12 | -.2157911 .2293564 -0.94 0.347 -.6653214 .2337392
YR13 | -.0412336 .2089766 -0.20 0.844 -.4508202 .3683529
YR14 | .1085185 .1845772 0.59 0.557 -.2532462 .4702831
YR15 | .3571066 .2162609 1.65 0.099 -.066757 .7809702
YR16 | .5880488 .242982 2.42 0.016 .1118129 1.064285
YR17 | .518406 .2218798 2.34 0.019 .0835296 .9532824
YR18 | .6243174 .2428429 2.57 0.010 .1483541 1.100281
YR19 | .7592175 .3363167 2.26 0.024 .100049 1.418386
YR20 | .6836577 .2278672 3.00 0.003 .2370462 1.130269
YR21 | 0 (omitted)
Imp_FE1 | 1.58989 .4691004 3.39 0.001 .6704697 2.50931
Imp_FE2 | 1.542975 .5017832 3.07 0.002 .5594977 2.526452
Imp_FE3 | .9104672 .4975008 1.83 0.067 -.0646164 1.885551
Imp_FE4 | -.1596206 .4402698 -0.36 0.717 -1.022534 .7032923
Imp_FE5 | 3.851282 .5783316 6.66 0.000 2.717773 4.984792
Imp_FE6 | 1.448788 .5446943 2.66 0.008 .3812065 2.516369
Imp_FE7 | -.2576722 .4723964 -0.55 0.585 -1.183552 .6682078
Imp_FE8 | .2248053 .4897663 0.46 0.646 -.7351189 1.18473
Imp_FE9 | .5332713 .4409573 1.21 0.227 -.3309891 1.397532
Imp_FE10 | 1.640335 .451596 3.63 0.000 .7552227 2.525446
Imp_FE11 | 1.518954 .5736724 2.65 0.008 .3945769 2.643332
Imp_FE12 | .1505784 1.110076 0.14 0.892 -2.025132 2.326288
Imp_FE13 | 1.884956 .574301 3.28 0.001 .7593463 3.010565
Imp_FE14 | .6061414 .7510758 0.81 0.420 -.8659401 2.078223
Imp_FE15 | .3853813 .4876849 0.79 0.429 -.5704635 1.341226
Imp_FE16 | .1135511 .5075052 0.22 0.823 -.8811408 1.108243
Imp_FE17 | 4.574264 .5250587 8.71 0.000 3.545167 5.60336
Imp_FE18 | 2.689692 .4984901 5.40 0.000 1.712669 3.666715
Imp_FE19 | -.148624 .4893643 -0.30 0.761 -1.10776 .8105124
Imp_FE20 | -.4661252 .5843572 -0.80 0.425 -1.611444 .6791938
Imp_FE21 | 1.591687 .7409989 2.15 0.032 .1393559 3.044018
Imp_FE22 | 1.66037 .5329054 3.12 0.002 .6158944 2.704845
Imp_FE23 | 1.034219 .7812646 1.32 0.186 -.4970311 2.56547
Imp_FE24 | 1.489572 .5090167 2.93 0.003 .4919174 2.487226
Imp_FE25 | 1.176602 .5340354 2.20 0.028 .1299117 2.223292
Imp_FE26 | 1.105201 .6090627 1.81 0.070 -.0885398 2.298942
Imp_FE27 | 2.654559 .4340863 6.12 0.000 1.803765 3.505352
Imp_FE28 | 1.531106 .8776364 1.74 0.081 -.1890299 3.251242
Imp_FE29 | .3832464 .6973546 0.55 0.583 -.9835435 1.750036
Imp_FE30 | 0 (omitted)
ln_distancebv | -.8236917 .3705728 -2.22 0.026 -1.550001 -.0973823
ln_gdp15_Obv | 2.136898 4.918015 0.43 0.664 -7.502234 11.77603
ln_gdp15_Dbv | 54.73018 5.862969 9.33 0.000 43.23897 66.22139
contigbv | 1.298234 .6498093 2.00 0.046 .0246307 2.571836
comlang_offbv | -.7467727 .362121 -2.06 0.039 -1.456517 -.0370286
gee_reporter_5 | 3.706266 .5156612 7.19 0.000 2.695588 4.716943
rqe_reporter_5 | -2.172155 .4599424 -4.72 0.000 -3.073626 -1.270684
rle_reporter_5 | -.8153704 .5973095 -1.37 0.172 -1.986075 .3553347
_cons | -.8481844 .5664779 -1.50 0.134 -1.958461 .262092
--------------------------------------------------------------------------------

Given this context, I have several questions:
Am I using the ppmlhdfe command correctly for my research aims?

Is the presence of many zero trade flows (132 out of 3044 observations) sufficient justification for using PPML over OLS and fixed effects estimations?

Should I be concerned about the warning and collinearity issues, and does it actually need further improvements to my model or dataset? Is having a Pseudo R2 = 0.7598 too high?

What tests or diagnostics would you recommend to ensure the robustness of my model? I have read some of the post here specifically about using RESET Test for PPML as suggested by Professor Silva. Are there any other test to check for this?

I appreciate any insights or suggestions you could provide to help enhance the validity of my analysis.

Thank you very much for your assistance.

James

Last edited by James Baloto; 26 May 2024, 07:31.
Tags: gravity model, panel data, PPML, trade
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#2

26 May 2024, 12:03

Dear James Baloto,

1. No, you are including the fixed effects as regressors when you can and should absorb them.
2. PPML is preferable to OLS in logs even if there are no zeros.
3. The Pseudo R2 is meaningless, you can ignore it. As for the multicollinearity, you only need to worry if the variable of interest drops out.
4. As most, do the RESET.

Finally, if you are including the fixed effects, do you really need the BV transformation?

Best wishes,

Joao
Comment

James Baloto

Join Date: Jan 2024
Posts: 7

26 May 2024, 15:49

Dear Professor Joao,

Thank you very much for your insightful feedback. After considering your comments I have done the following:

I have corrected code and considered not using BV transformation in consideration of the inclusion of fixed effects (and after comparing the result of with and without BV transformation, there is not significant difference in the coefficient). After applying the code below, it absorbed 51 dimensions of fixed effects is due to the time (21 yrs.) and importer fixed effects (30 countries). In my understanding, no variable of interest drops out in my model. My question is should I be worried on the result of the table showing the degrees of Absorbed degrees of freedom? I’m just worried because of the note in the number of coefficients column which is saying ? = number of redundant parameters may be higher.

this the corrected code and the result for context:

ppmlhdfe trade ln_dist_CAP ln_gdp15_O ln_gdp15_D commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5, absorb(YR* Imp_FE*)

(warning: absorbing 51 dimensions of fixed effects; check that you really want that)
warning: dependent variable takes very low values after standardizing (4.7427e-07)

Iteration 1: deviance = 1.1649e+11 eps = . iters = 4 tol = 1.0e-04 min(eta) = -4.21 P
Iteration 2: deviance = 8.1984e+10 eps = 4.21e-01 iters = 4 tol = 1.0e-04 min(eta) = -6.33
Iteration 3: deviance = 7.7676e+10 eps = 5.55e-02 iters = 4 tol = 1.0e-04 min(eta) = -8.02
Iteration 4: deviance = 7.7487e+10 eps = 2.44e-03 iters = 3 tol = 1.0e-04 min(eta) = -8.57
Iteration 5: deviance = 7.7486e+10 eps = 9.95e-06 iters = 3 tol = 1.0e-04 min(eta) = -8.61
Iteration 6: deviance = 7.7486e+10 eps = 2.47e-10 iters = 3 tol = 1.0e-05 min(eta) = -8.61 S
Iteration 7: deviance = 7.7486e+10 eps = 6.06e-16 iters = 3 tol = 1.0e-07 min(eta) = -8.61 S
Iteration 8: deviance = 7.7486e+10 eps = 1.52e-16 iters = 1 tol = 1.0e-09 min(eta) = -8.61 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 8 iterations and 25 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 2,012
Absorbing 51 HDFE groups Residual df = 1,951
Wald chi2(9) = 639.54
Deviance = 7.74861e+10 Prob > chi2 = 0.0000
Log pseudolikelihood = -3.87431e+10 Pseudo R2 = 0.8138
--------------------------------------------------------------------------------
| Robust
trade | Coefficient std. err. z P>|z| [95% conf. interval]
---------------+----------------------------------------------------------------
ln_dist_CAP | -2.067701 .2581101 -8.01 0.000 -2.573587 -1.561814
ln_gdp15_O | 2.506835 .1481999 16.92 0.000 2.216369 2.797302
ln_gdp15_D | .616362 .2466608 2.50 0.012 .1329157 1.099808
commlang_off | -.8434657 .1303495 -6.47 0.000 -1.098946 -.5879854
contg | .7664052 .2605078 2.94 0.003 .2558193 1.276991
comcol | 1.13593 .1590382 7.14 0.000 .8242212 1.44764
gee_reporter_5 | 4.100821 .2774686 14.78 0.000 3.556993 4.64465
rqe_reporter_5 | -2.010342 .2879135 -6.98 0.000 -2.574642 -1.446042
rle_reporter_5 | -1.317353 .2758416 -4.78 0.000 -1.857992 -.7767133
_cons | -50.5563 7.369841 -6.86 0.000 -65.00092 -36.11167
--------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
YR1 | 2 0 2 |
YR2 | 2 1 1 |
YR3 | 2 1 1 ?|
YR4 | 2 1 1 ?|
YR5 | 2 1 1 ?|
YR6 | 2 1 1 ?|
YR7 | 2 1 1 ?|
YR8 | 2 1 1 ?|
YR9 | 2 1 1 ?|
YR10 | 2 1 1 ?|
YR11 | 2 1 1 ?|
YR12 | 2 1 1 ?|
YR13 | 2 1 1 ?|
YR14 | 2 1 1 ?|
YR15 | 2 1 1 ?|
YR16 | 2 1 1 ?|
YR17 | 2 1 1 ?|
YR18 | 2 1 1 ?|
YR19 | 2 1 1 ?|
YR20 | 2 1 1 ?|
YR21 | 2 1 1 ?|
Imp_FE1 | 2 1 1 ?|
Imp_FE2 | 2 1 1 ?|
Imp_FE3 | 2 1 1 ?|
Imp_FE4 | 2 1 1 ?|
Imp_FE5 | 2 1 1 ?|
Imp_FE6 | 2 1 1 ?|
Imp_FE7 | 2 1 1 ?|
Imp_FE8 | 2 1 1 ?|
Imp_FE9 | 2 1 1 ?|
Imp_FE10 | 2 1 1 ?|
Imp_FE11 | 2 1 1 ?|
Imp_FE12 | 2 1 1 ?|
Imp_FE13 | 2 1 1 ?|
Imp_FE14 | 2 1 1 ?|
Imp_FE15 | 2 1 1 ?|
Imp_FE16 | 2 1 1 ?|
Imp_FE17 | 2 1 1 ?|
Imp_FE18 | 2 1 1 ?|
Imp_FE19 | 2 1 1 ?|
Imp_FE20 | 2 1 1 ?|
Imp_FE21 | 2 1 1 ?|
Imp_FE22 | 2 1 1 ?|
Imp_FE23 | 2 1 1 ?|
Imp_FE24 | 2 1 1 ?|
Imp_FE25 | 2 1 1 ?|
Imp_FE26 | 2 1 1 ?|
Imp_FE27 | 2 1 1 ?|
Imp_FE28 | 2 1 1 ?|
Imp_FE29 | 2 1 1 ?|
Imp_FE30 | 2 1 1 ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

After conducting RESET Test, (I have check the Log of Gravity page in which has the sample code to perform the test, I was able to have a non-significant result (with the chi-squared statistic of 2.51 and a p-value of 0.113.). With this, I have improved my model and after conducting RESET Test again, I gain a positive result ( I consider the inclusion of the forest cover of the exporting country). My question is am I using correct code in applying the RESET test correctly, and am I right that the solution to this problem is to improve my model?

The results of my reset test are the following:

First code and its result:	Second code and its result:
ppmlhdfe trade_Musd ln_dist_CAP ln_gdp15_O ln_gdp15_D commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5 , absorb(YR* Imp_FE) predict fit, xb gen fit2=fit^2 ppmlhdfe trade_Musd ln_dist_CAP ln_gdp15_O ln_gdp15_D commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5 fit2 , absorb(YR Imp_FE*) test fit2=0 ( 1) fit2 = 0 chi2( 1) = 2.51 Prob > chi2 = 0.1130	ppmlhdfe may23trade ln_dist_CAP ln_gdp15_D ln_for_o commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5, absorb(YR* Imp_FE) predict fit, xb gen fit2=fit^2 ppmlhdfe may23trade ln_dist_CAP ln_gdp15_D ln_for_o commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5 fit2, absorb(YR Imp_FE*) test fit2=0 ( 1) fit2 = 0 chi2( 1) = 17.12 Prob > chi2 = 0.0000

I look forward to your guidance on these matters.

Thank you and best regards,

James

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#4

26 May 2024, 22:10

Dear James Baloto,

You are not using the command correctly. You should not absorb the dummies but the variables identifying the categories. For example, rather than absorbing one dummy for each year, you should just absorb the variable year (please check the help file).

Also, do you have a single exporter? If not, you probably want to include exporter fixed effects.

Best wishes,

Joao

Last edited by Joao Santos Silva; 26 May 2024, 22:22.
Comment
James Baloto

Join Date: Jan 2024

Posts: 7
#5

28 May 2024, 09:47

Thank you for the comment, Prof. Silva. After checking the help file for ppmlhdfe, I revised my code to include importer and exporter fixed effects.

ppmlhdfe trade dist gdp15 contig comlang_off comrelig col45 wto gee_r rqe_r rle_r, a( importer##c.year exporter##c.year)
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#6

28 May 2024, 09:51

Do not use the c. before the year variable.
Comment
James Baloto

Join Date: Jan 2024

Posts: 7
#7

28 May 2024, 11:18

Thank you for your feedback, Professor Silva. I modified the code as suggested; however, this resulted in the exclusion of several observations and the omission of key variables that are crucial for my analysis. Could you please advise on how to address these issues?

ppmlhdfe trade dist gdp15 contig comlang_off comrelig col45 wto gee_r rqe_r rle_r, a( importer#year exporter#year)

(dropped 51 observations that are either singletons or separated by a fixed effect)
warning: dependent variable takes very low values after standardizing (4.6093e-07)

note: 5 variables omitted because of collinearity: ln_gdp15_O wto_o gee_reporter rqe_reporter rle_reporter
Iteration 1: deviance = 8.0111e+04 eps = . iters = 7 tol = 1.0e-04 min(eta) = -4.05 P
Iteration 2: deviance = 4.7026e+04 eps = 7.04e-01 iters = 6 tol = 1.0e-04 min(eta) = -7.06
Iteration 3: deviance = 3.9629e+04 eps = 1.87e-01 iters = 7 tol = 1.0e-04 min(eta) = -10.53
Iteration 4: deviance = 3.8342e+04 eps = 3.36e-02 iters = 7 tol = 1.0e-04 min(eta) = -13.06
Iteration 5: deviance = 3.8261e+04 eps = 2.11e-03 iters = 7 tol = 1.0e-04 min(eta) = -13.77
Iteration 6: deviance = 3.8258e+04 eps = 8.96e-05 iters = 6 tol = 1.0e-04 min(eta) = -13.81
Iteration 7: deviance = 3.8258e+04 eps = 1.16e-05 iters = 5 tol = 1.0e-05 min(eta) = -13.81
Iteration 8: deviance = 3.8258e+04 eps = 1.07e-06 iters = 2 tol = 1.0e-05 min(eta) = -13.81
Iteration 9: deviance = 3.8258e+04 eps = 2.14e-08 iters = 3 tol = 1.0e-06 min(eta) = -13.81 S
Iteration 10: deviance = 3.8258e+04 eps = 1.54e-11 iters = 2 tol = 1.0e-07 min(eta) = -13.81 S
Iteration 11: deviance = 3.8258e+04 eps = 3.47e-15 iters = 4 tol = 1.0e-09 min(eta) = -13.81 S O
------------------------------------------------------------------------------------------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
Converged in 11 iterations and 56 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression No. of obs = 1,867
Absorbing 2 HDFE groups Residual df = 1,232
Wald chi2(5) = 223.53
Deviance = 38257.53494 Prob > chi2 = 0.0000
Log pseudolikelihood = -22722.68485 Pseudo R2 = 0.8876
------------------------------------------------------------------------------
| Robust
trade | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
dist | -4.308031 .4017945 -10.72 0.000 -5.095534 -3.520528
gdp15 | 0 (omitted)
contig | -.5269193 .2717843 -1.94 0.053 -1.059607 .0057682
comlang_off | -.8912416 .156393 -5.70 0.000 -1.197766 -.5847169
comrelig | -.3250128 .3072165 -1.06 0.290 -.927146 .2771204
col45 | .5033801 .1017159 4.95 0.000 .3040206 .7027396
wto_o | 0 (omitted)
gee_r | 0 (omitted)
rqe_r | 0 (omitted)
rle_r | 0 (omitted)
_cons | 41.9525 3.38319 12.40 0.000 35.32157 48.58343
------------------------------------------------------------------------------

Absorbed degrees of freedom:
---------------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-----------------+---------------------------------------|
importer#year | 551 0 551 |
exporter#year | 99 20 79 |
---------------------------------------------------------+

Last edited by James Baloto; 28 May 2024, 11:21.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#8

28 May 2024, 11:55

I believe that those variables are dropped because they are collinear with the fixed effects. Do not worry about the dropped observations.
Comment

Announcement