Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions re: PPML for research on trade

    Good Day,

    I am not well-versed in using Stata and am a newcomer to econometrics. Currently, I am doing a research examining the impact of institutional quality on the export of wood products for 21 years. Below, I've outlined the key variables employed in my study:
    • trade_Musd: Trade value in million USD.
    • ln_distancebv: Natural logarithm of the distance between trade partners.
    • ln_gdp15_Obv: Natural logarithm of the constant 2015 GDP of the reporting country.
    • ln_gdp15_Dbv: Natural logarithm of the constant 2015 GDP of the partner country.
    • contigbv: Indicator for whether countries share a border (contiguity).
    • comlang_offbv: Indicator for whether countries share an official language.
    • gee_reporter_5, rqe_reporter_5, rle_reporter_5: Rescaled indicators of institutional quality (from a scale of -2.5 – 2.5 to 0 – 5).
    Additionally, variables ending with “bv” have been adjusted using the Bonus-Vetus Method proposed by Baier and Bergstrand to address Multilateral Trade Resistance (MTR).

    I am utilizing the PPML method to account for zero trade flows in the data, which are notably present in 132 out of 3044 observations. I have structured my model as follows:

    .ppmlhdfe trade_Musd YR* Imp_FE* ln_distancebv ln_gdp15_Obv ln_gdp15_Dbv contigbv comlang_offbv gee_reporter_5 rqe_reporter_5 rle_reporter_5, cluster (country_pair)

    However, I have encountered warnings and issues during modeling:
    • Warning: The dependent variable takes very low values after standardizing (4.7427e-07).
    • Note: Variables YR21 and Imp_FE30 were omitted due to collinearity.
    This is my code and the result of my code in stata:

    ppmlhdfe trade_Musd YR* Imp_FE* ln_distancebv ln_gdp15_Obv ln_gdp15_Dbv contigbv comlang_offbv gee_reporter_5 rqe_reporter_5 rle_reporter_5 , cluster ( country_pair)
    warning: dependent variable takes very low values after standardizing (4.7427e-07)
    note: 2 variables omitted because of collinearity: YR21 Imp_FE30
    Iteration 1: deviance = 1.2667e+05 eps = . iters = 1 tol = 1.0e-04 min(eta) = -4.26 P
    Iteration 2: deviance = 9.6560e+04 eps = 3.12e-01 iters = 1 tol = 1.0e-04 min(eta) = -5.47
    Iteration 3: deviance = 9.4215e+04 eps = 2.49e-02 iters = 1 tol = 1.0e-04 min(eta) = -6.10
    Iteration 4: deviance = 9.4178e+04 eps = 3.91e-04 iters = 1 tol = 1.0e-04 min(eta) = -6.20
    Iteration 5: deviance = 9.4178e+04 eps = 4.23e-07 iters = 1 tol = 1.0e-04 min(eta) = -6.20
    Iteration 6: deviance = 9.4178e+04 eps = 4.26e-12 iters = 1 tol = 1.0e-05 min(eta) = -6.20 S O
    ------------------------------------------------------------------------------------------------------------
    (legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
    Converged in 6 iterations and 6 HDFE sub-iterations (tol = 1.0e-08)

    PPML regression No. of obs = 2,012
    Residual df = 144
    Statistics robust to heteroskedasticity Wald chi2(57) = 6317.31
    Deviance = 94177.76498 Prob > chi2 = 0.0000
    Log pseudolikelihood = -50876.06515 Pseudo R2 = 0.7598

    Number of clusters (country_pair)= 145
    (Std. err. adjusted for 145 clusters in country_pair)
    --------------------------------------------------------------------------------
    | Robust
    trade_Musd | Coefficient std. err. z P>|z| [95% conf. interval]
    ---------------+----------------------------------------------------------------
    YR1 | .5201256 .4521295 1.15 0.250 -.366032 1.406283
    YR2 | .5693542 .425712 1.34 0.181 -.265026 1.403735
    YR3 | .0812014 .3480763 0.23 0.816 -.6010157 .7634185
    YR4 | -.0667056 .3440161 -0.19 0.846 -.7409648 .6075537
    YR5 | -.2977823 .3044973 -0.98 0.328 -.894586 .2990213
    YR6 | -.453788 .3123138 -1.45 0.146 -1.065912 .1583358
    YR7 | -.3775395 .2578755 -1.46 0.143 -.8829662 .1278871
    YR8 | -.27286 .258202 -1.06 0.291 -.7789266 .2332065
    YR9 | -.4517603 .3093984 -1.46 0.144 -1.05817 .1546495
    YR10 | -.5358888 .2348802 -2.28 0.023 -.9962455 -.0755321
    YR11 | -.4674764 .2593535 -1.80 0.071 -.9758 .0408472
    YR12 | -.2157911 .2293564 -0.94 0.347 -.6653214 .2337392
    YR13 | -.0412336 .2089766 -0.20 0.844 -.4508202 .3683529
    YR14 | .1085185 .1845772 0.59 0.557 -.2532462 .4702831
    YR15 | .3571066 .2162609 1.65 0.099 -.066757 .7809702
    YR16 | .5880488 .242982 2.42 0.016 .1118129 1.064285
    YR17 | .518406 .2218798 2.34 0.019 .0835296 .9532824
    YR18 | .6243174 .2428429 2.57 0.010 .1483541 1.100281
    YR19 | .7592175 .3363167 2.26 0.024 .100049 1.418386
    YR20 | .6836577 .2278672 3.00 0.003 .2370462 1.130269
    YR21 | 0 (omitted)
    Imp_FE1 | 1.58989 .4691004 3.39 0.001 .6704697 2.50931
    Imp_FE2 | 1.542975 .5017832 3.07 0.002 .5594977 2.526452
    Imp_FE3 | .9104672 .4975008 1.83 0.067 -.0646164 1.885551
    Imp_FE4 | -.1596206 .4402698 -0.36 0.717 -1.022534 .7032923
    Imp_FE5 | 3.851282 .5783316 6.66 0.000 2.717773 4.984792
    Imp_FE6 | 1.448788 .5446943 2.66 0.008 .3812065 2.516369
    Imp_FE7 | -.2576722 .4723964 -0.55 0.585 -1.183552 .6682078
    Imp_FE8 | .2248053 .4897663 0.46 0.646 -.7351189 1.18473
    Imp_FE9 | .5332713 .4409573 1.21 0.227 -.3309891 1.397532
    Imp_FE10 | 1.640335 .451596 3.63 0.000 .7552227 2.525446
    Imp_FE11 | 1.518954 .5736724 2.65 0.008 .3945769 2.643332
    Imp_FE12 | .1505784 1.110076 0.14 0.892 -2.025132 2.326288
    Imp_FE13 | 1.884956 .574301 3.28 0.001 .7593463 3.010565
    Imp_FE14 | .6061414 .7510758 0.81 0.420 -.8659401 2.078223
    Imp_FE15 | .3853813 .4876849 0.79 0.429 -.5704635 1.341226
    Imp_FE16 | .1135511 .5075052 0.22 0.823 -.8811408 1.108243
    Imp_FE17 | 4.574264 .5250587 8.71 0.000 3.545167 5.60336
    Imp_FE18 | 2.689692 .4984901 5.40 0.000 1.712669 3.666715
    Imp_FE19 | -.148624 .4893643 -0.30 0.761 -1.10776 .8105124
    Imp_FE20 | -.4661252 .5843572 -0.80 0.425 -1.611444 .6791938
    Imp_FE21 | 1.591687 .7409989 2.15 0.032 .1393559 3.044018
    Imp_FE22 | 1.66037 .5329054 3.12 0.002 .6158944 2.704845
    Imp_FE23 | 1.034219 .7812646 1.32 0.186 -.4970311 2.56547
    Imp_FE24 | 1.489572 .5090167 2.93 0.003 .4919174 2.487226
    Imp_FE25 | 1.176602 .5340354 2.20 0.028 .1299117 2.223292
    Imp_FE26 | 1.105201 .6090627 1.81 0.070 -.0885398 2.298942
    Imp_FE27 | 2.654559 .4340863 6.12 0.000 1.803765 3.505352
    Imp_FE28 | 1.531106 .8776364 1.74 0.081 -.1890299 3.251242
    Imp_FE29 | .3832464 .6973546 0.55 0.583 -.9835435 1.750036
    Imp_FE30 | 0 (omitted)
    ln_distancebv | -.8236917 .3705728 -2.22 0.026 -1.550001 -.0973823
    ln_gdp15_Obv | 2.136898 4.918015 0.43 0.664 -7.502234 11.77603
    ln_gdp15_Dbv | 54.73018 5.862969 9.33 0.000 43.23897 66.22139
    contigbv | 1.298234 .6498093 2.00 0.046 .0246307 2.571836
    comlang_offbv | -.7467727 .362121 -2.06 0.039 -1.456517 -.0370286
    gee_reporter_5 | 3.706266 .5156612 7.19 0.000 2.695588 4.716943
    rqe_reporter_5 | -2.172155 .4599424 -4.72 0.000 -3.073626 -1.270684
    rle_reporter_5 | -.8153704 .5973095 -1.37 0.172 -1.986075 .3553347
    _cons | -.8481844 .5664779 -1.50 0.134 -1.958461 .262092
    --------------------------------------------------------------------------------


    Given this context, I have several questions:
    1. Am I using the ppmlhdfe command correctly for my research aims?
    2. Is the presence of many zero trade flows (132 out of 3044 observations) sufficient justification for using PPML over OLS and fixed effects estimations?
    3. Should I be concerned about the warning and collinearity issues, and does it actually need further improvements to my model or dataset? Is having a Pseudo R2 = 0.7598 too high?
    4. What tests or diagnostics would you recommend to ensure the robustness of my model? I have read some of the post here specifically about using RESET Test for PPML as suggested by Professor Silva. Are there any other test to check for this?
    I appreciate any insights or suggestions you could provide to help enhance the validity of my analysis.

    Thank you very much for your assistance.

    James
    Last edited by James Baloto; 26 May 2024, 08:31.

  • #2
    Dear James Baloto,

    1. No, you are including the fixed effects as regressors when you can and should absorb them.
    2. PPML is preferable to OLS in logs even if there are no zeros.
    3. The Pseudo R2 is meaningless, you can ignore it. As for the multicollinearity, you only need to worry if the variable of interest drops out.
    4. As most, do the RESET.

    Finally, if you are including the fixed effects, do you really need the BV transformation?

    Best wishes,

    Joao

    Comment


    • #3
      Dear Professor Joao,

      Thank you very much for your insightful feedback. After considering your comments I have done the following:
      1. I have corrected code and considered not using BV transformation in consideration of the inclusion of fixed effects (and after comparing the result of with and without BV transformation, there is not significant difference in the coefficient). After applying the code below, it absorbed 51 dimensions of fixed effects is due to the time (21 yrs.) and importer fixed effects (30 countries). In my understanding, no variable of interest drops out in my model. My question is should I be worried on the result of the table showing the degrees of Absorbed degrees of freedom? I’m just worried because of the note in the number of coefficients column which is saying ? = number of redundant parameters may be higher.
      this the corrected code and the result for context:

      ppmlhdfe trade ln_dist_CAP ln_gdp15_O ln_gdp15_D commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5, absorb(YR* Imp_FE*)

      (warning: absorbing 51 dimensions of fixed effects; check that you really want that)
      warning: dependent variable takes very low values after standardizing (4.7427e-07)

      Iteration 1: deviance = 1.1649e+11 eps = . iters = 4 tol = 1.0e-04 min(eta) = -4.21 P
      Iteration 2: deviance = 8.1984e+10 eps = 4.21e-01 iters = 4 tol = 1.0e-04 min(eta) = -6.33
      Iteration 3: deviance = 7.7676e+10 eps = 5.55e-02 iters = 4 tol = 1.0e-04 min(eta) = -8.02
      Iteration 4: deviance = 7.7487e+10 eps = 2.44e-03 iters = 3 tol = 1.0e-04 min(eta) = -8.57
      Iteration 5: deviance = 7.7486e+10 eps = 9.95e-06 iters = 3 tol = 1.0e-04 min(eta) = -8.61
      Iteration 6: deviance = 7.7486e+10 eps = 2.47e-10 iters = 3 tol = 1.0e-05 min(eta) = -8.61 S
      Iteration 7: deviance = 7.7486e+10 eps = 6.06e-16 iters = 3 tol = 1.0e-07 min(eta) = -8.61 S
      Iteration 8: deviance = 7.7486e+10 eps = 1.52e-16 iters = 1 tol = 1.0e-09 min(eta) = -8.61 S O
      ------------------------------------------------------------------------------------------------------------
      (legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
      Converged in 8 iterations and 25 HDFE sub-iterations (tol = 1.0e-08)

      HDFE PPML regression No. of obs = 2,012
      Absorbing 51 HDFE groups Residual df = 1,951
      Wald chi2(9) = 639.54
      Deviance = 7.74861e+10 Prob > chi2 = 0.0000
      Log pseudolikelihood = -3.87431e+10 Pseudo R2 = 0.8138
      --------------------------------------------------------------------------------
      | Robust
      trade | Coefficient std. err. z P>|z| [95% conf. interval]
      ---------------+----------------------------------------------------------------
      ln_dist_CAP | -2.067701 .2581101 -8.01 0.000 -2.573587 -1.561814
      ln_gdp15_O | 2.506835 .1481999 16.92 0.000 2.216369 2.797302
      ln_gdp15_D | .616362 .2466608 2.50 0.012 .1329157 1.099808
      commlang_off | -.8434657 .1303495 -6.47 0.000 -1.098946 -.5879854
      contg | .7664052 .2605078 2.94 0.003 .2558193 1.276991
      comcol | 1.13593 .1590382 7.14 0.000 .8242212 1.44764
      gee_reporter_5 | 4.100821 .2774686 14.78 0.000 3.556993 4.64465
      rqe_reporter_5 | -2.010342 .2879135 -6.98 0.000 -2.574642 -1.446042
      rle_reporter_5 | -1.317353 .2758416 -4.78 0.000 -1.857992 -.7767133
      _cons | -50.5563 7.369841 -6.86 0.000 -65.00092 -36.11167
      --------------------------------------------------------------------------------

      Absorbed degrees of freedom:
      -----------------------------------------------------+
      Absorbed FE | Categories - Redundant = Num. Coefs |
      -------------+---------------------------------------|
      YR1 | 2 0 2 |
      YR2 | 2 1 1 |
      YR3 | 2 1 1 ?|
      YR4 | 2 1 1 ?|
      YR5 | 2 1 1 ?|
      YR6 | 2 1 1 ?|
      YR7 | 2 1 1 ?|
      YR8 | 2 1 1 ?|
      YR9 | 2 1 1 ?|
      YR10 | 2 1 1 ?|
      YR11 | 2 1 1 ?|
      YR12 | 2 1 1 ?|
      YR13 | 2 1 1 ?|
      YR14 | 2 1 1 ?|
      YR15 | 2 1 1 ?|
      YR16 | 2 1 1 ?|
      YR17 | 2 1 1 ?|
      YR18 | 2 1 1 ?|
      YR19 | 2 1 1 ?|
      YR20 | 2 1 1 ?|
      YR21 | 2 1 1 ?|
      Imp_FE1 | 2 1 1 ?|
      Imp_FE2 | 2 1 1 ?|
      Imp_FE3 | 2 1 1 ?|
      Imp_FE4 | 2 1 1 ?|
      Imp_FE5 | 2 1 1 ?|
      Imp_FE6 | 2 1 1 ?|
      Imp_FE7 | 2 1 1 ?|
      Imp_FE8 | 2 1 1 ?|
      Imp_FE9 | 2 1 1 ?|
      Imp_FE10 | 2 1 1 ?|
      Imp_FE11 | 2 1 1 ?|
      Imp_FE12 | 2 1 1 ?|
      Imp_FE13 | 2 1 1 ?|
      Imp_FE14 | 2 1 1 ?|
      Imp_FE15 | 2 1 1 ?|
      Imp_FE16 | 2 1 1 ?|
      Imp_FE17 | 2 1 1 ?|
      Imp_FE18 | 2 1 1 ?|
      Imp_FE19 | 2 1 1 ?|
      Imp_FE20 | 2 1 1 ?|
      Imp_FE21 | 2 1 1 ?|
      Imp_FE22 | 2 1 1 ?|
      Imp_FE23 | 2 1 1 ?|
      Imp_FE24 | 2 1 1 ?|
      Imp_FE25 | 2 1 1 ?|
      Imp_FE26 | 2 1 1 ?|
      Imp_FE27 | 2 1 1 ?|
      Imp_FE28 | 2 1 1 ?|
      Imp_FE29 | 2 1 1 ?|
      Imp_FE30 | 2 1 1 ?|
      -----------------------------------------------------+
      ? = number of redundant parameters may be higher
      1. After conducting RESET Test, (I have check the Log of Gravity page in which has the sample code to perform the test, I was able to have a non-significant result (with the chi-squared statistic of 2.51 and a p-value of 0.113.). With this, I have improved my model and after conducting RESET Test again, I gain a positive result ( I consider the inclusion of the forest cover of the exporting country). My question is am I using correct code in applying the RESET test correctly, and am I right that the solution to this problem is to improve my model?
      The results of my reset test are the following:
      First code and its result: Second code and its result:
      ppmlhdfe trade_Musd ln_dist_CAP ln_gdp15_O ln_gdp15_D commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5 , absorb(YR* Imp_FE*)

      predict fit, xb

      gen fit2=fit^2

      ppmlhdfe trade_Musd ln_dist_CAP ln_gdp15_O ln_gdp15_D commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5 fit2 , absorb(YR* Imp_FE*)

      test fit2=0

      ( 1) fit2 = 0

      chi2( 1) = 2.51
      Prob > chi2 = 0.1130
      ppmlhdfe may23trade ln_dist_CAP ln_gdp15_D ln_for_o commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5, absorb(YR* Imp_FE*)

      predict fit, xb

      gen fit2=fit^2

      ppmlhdfe may23trade ln_dist_CAP ln_gdp15_D ln_for_o commlang_off contg comcol gee_reporter_5 rqe_reporter_5 rle_reporter_5 fit2, absorb(YR* Imp_FE*)

      test fit2=0

      ( 1) fit2 = 0

      chi2( 1) = 17.12
      Prob > chi2 = 0.0000
      I look forward to your guidance on these matters.

      Thank you and best regards,

      James

      Comment


      • #4
        Dear James Baloto,

        You are not using the command correctly. You should not absorb the dummies but the variables identifying the categories. For example, rather than absorbing one dummy for each year, you should just absorb the variable year (please check the help file).

        Also, do you have a single exporter? If not, you probably want to include exporter fixed effects.

        Best wishes,

        Joao
        Last edited by Joao Santos Silva; 26 May 2024, 23:22.

        Comment


        • #5
          Thank you for the comment, Prof. Silva. After checking the help file for ppmlhdfe, I revised my code to include importer and exporter fixed effects.

          ppmlhdfe trade dist gdp15 contig comlang_off comrelig col45 wto gee_r rqe_r rle_r, a( importer##c.year exporter##c.year)

          Comment


          • #6
            Do not use the c. before the year variable.

            Comment


            • #7
              Thank you for your feedback, Professor Silva. I modified the code as suggested; however, this resulted in the exclusion of several observations and the omission of key variables that are crucial for my analysis. Could you please advise on how to address these issues?

              ppmlhdfe trade dist gdp15 contig comlang_off comrelig col45 wto gee_r rqe_r rle_r, a( importer#year exporter#year)

              (dropped 51 observations that are either singletons or separated by a fixed effect)
              warning: dependent variable takes very low values after standardizing (4.6093e-07)

              note: 5 variables omitted because of collinearity: ln_gdp15_O wto_o gee_reporter rqe_reporter rle_reporter
              Iteration 1: deviance = 8.0111e+04 eps = . iters = 7 tol = 1.0e-04 min(eta) = -4.05 P
              Iteration 2: deviance = 4.7026e+04 eps = 7.04e-01 iters = 6 tol = 1.0e-04 min(eta) = -7.06
              Iteration 3: deviance = 3.9629e+04 eps = 1.87e-01 iters = 7 tol = 1.0e-04 min(eta) = -10.53
              Iteration 4: deviance = 3.8342e+04 eps = 3.36e-02 iters = 7 tol = 1.0e-04 min(eta) = -13.06
              Iteration 5: deviance = 3.8261e+04 eps = 2.11e-03 iters = 7 tol = 1.0e-04 min(eta) = -13.77
              Iteration 6: deviance = 3.8258e+04 eps = 8.96e-05 iters = 6 tol = 1.0e-04 min(eta) = -13.81
              Iteration 7: deviance = 3.8258e+04 eps = 1.16e-05 iters = 5 tol = 1.0e-05 min(eta) = -13.81
              Iteration 8: deviance = 3.8258e+04 eps = 1.07e-06 iters = 2 tol = 1.0e-05 min(eta) = -13.81
              Iteration 9: deviance = 3.8258e+04 eps = 2.14e-08 iters = 3 tol = 1.0e-06 min(eta) = -13.81 S
              Iteration 10: deviance = 3.8258e+04 eps = 1.54e-11 iters = 2 tol = 1.0e-07 min(eta) = -13.81 S
              Iteration 11: deviance = 3.8258e+04 eps = 3.47e-15 iters = 4 tol = 1.0e-09 min(eta) = -13.81 S O
              ------------------------------------------------------------------------------------------------------------
              (legend: p: exact partial-out s: exact solver h: step-halving o: epsilon below tolerance)
              Converged in 11 iterations and 56 HDFE sub-iterations (tol = 1.0e-08)

              HDFE PPML regression No. of obs = 1,867
              Absorbing 2 HDFE groups Residual df = 1,232
              Wald chi2(5) = 223.53
              Deviance = 38257.53494 Prob > chi2 = 0.0000
              Log pseudolikelihood = -22722.68485 Pseudo R2 = 0.8876
              ------------------------------------------------------------------------------
              | Robust
              trade | Coefficient std. err. z P>|z| [95% conf. interval]
              -------------+----------------------------------------------------------------
              dist | -4.308031 .4017945 -10.72 0.000 -5.095534 -3.520528
              gdp15 | 0 (omitted)
              contig | -.5269193 .2717843 -1.94 0.053 -1.059607 .0057682
              comlang_off | -.8912416 .156393 -5.70 0.000 -1.197766 -.5847169
              comrelig | -.3250128 .3072165 -1.06 0.290 -.927146 .2771204
              col45 | .5033801 .1017159 4.95 0.000 .3040206 .7027396
              wto_o | 0 (omitted)
              gee_r | 0 (omitted)
              rqe_r | 0 (omitted)
              rle_r | 0 (omitted)
              _cons | 41.9525 3.38319 12.40 0.000 35.32157 48.58343
              ------------------------------------------------------------------------------

              Absorbed degrees of freedom:
              ---------------------------------------------------------+
              Absorbed FE | Categories - Redundant = Num. Coefs |
              -----------------+---------------------------------------|
              importer#year | 551 0 551 |
              exporter#year | 99 20 79 |
              ---------------------------------------------------------+

              Last edited by James Baloto; 28 May 2024, 12:21.

              Comment


              • #8
                I believe that those variables are dropped because they are collinear with the fixed effects. Do not worry about the dropped observations.

                Comment

                Working...
                X