Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPMLHDFE - omitted variables

    Hello community,

    I am using the ppmlhdfe regression to study the impact of common language (comlang_off) on trade (tradeflow_baci) between many African countries.

    For that I am using exporter fixed effects (exp_year), importer fixed effects (imp_year) and country pair fixed effects (pair_id).


    1 - Regression: pmlhdfe tradeflow_baci fta_wto ln_dist contig comlang_off, a(exp_year imp_year pair_id) cluster (pair_id) nolog

    Output with country pair_id fixed effects:

    (dropped 283 observations that are either singletons or separated by a fixed effect)
    warning: dependent variable takes very low values after standardizing (5.1476e-09)
    note: 1 variable omitted because of collinearity: comlang_off
    note: ln_dist is probably collinear with the fixed effects (all partialled-out values are close to zero; tol = 1.0e-06)
    Converged in 13 iterations and 69 HDFE sub-iterations (tol = 1.0e-08)

    HDFE PPML regression No. of obs = 9,051
    Absorbing 3 HDFE groups Residual df = 2,047
    Statistics robust to heteroskedasticity Wald chi2(2) = 1.17
    Deviance = 52472688.32 Prob > chi2 = 0.5574
    Log pseudolikelihood = -26271738.18 Pseudo R2 = 0.9676

    Number of clusters (pair_id)= 2,048
    (Std. err. adjusted for 2,048 clusters in pair_id)
    ------------------------------------------------------------------------------
    | Robust
    tradeflow_~i | Coefficient std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    fta_wto | -.2064625 .2064227 -1.00 0.317 -.6110436 .1981186
    ln_dist | 0 (omitted)
    contig | .0691352 .1640386 0.42 0.673 -.2523745 .3906449
    comlang_off | 0 (omitted)
    _cons | 13.10739 .1680008 78.02 0.000 12.77812 13.43667
    ------------------------------------------------------------------------------

    Absorbed degrees of freedom:
    -----------------------------------------------------+
    Absorbed FE | Categories - Redundant = Num. Coefs |
    -------------+---------------------------------------|
    exp_year | 306 1 305 |
    imp_year | 308 6 302 |
    pair_id | 2048 2048 0 *|
    -----------------------------------------------------+
    * = FE nested within cluster; treated as redundant for DoF computation
    Dear community

    2 - Regression without country pair_id: ppmlhdfe tradeflow_baci fta_wto ln_dist contig comlang_off, a(exp_year imp_year) cluster (pair_id) nolog

    Output without country pair_id fixed effects:

    (dropped 2 observations that are either singletons or separated by a fixed effect)
    warning: dependent variable takes very low values after standardizing (5.2246e-09)
    Converged in 11 iterations and 50 HDFE sub-iterations (tol = 1.0e-08)

    HDFE PPML regression No. of obs = 9,332
    Absorbing 2 HDFE groups Residual df = 2,328
    Statistics robust to heteroskedasticity Wald chi2(4) = 732.50
    Deviance = 248048316.2 Prob > chi2 = 0.0000
    Log pseudolikelihood = -124060124.6 Pseudo R2 = 0.8487

    Number of clusters (pair_id)= 2,329
    (Std. err. adjusted for 2,329 clusters in pair_id)
    ------------------------------------------------------------------------------
    | Robust
    tradeflow_~i | Coefficient std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    fta_wto | .7144648 .1703926 4.19 0.000 .3805013 1.048428
    ln_dist | -.8836056 .1328718 -6.65 0.000 -1.14403 -.6231816
    contig | .773755 .1805465 4.29 0.000 .4198903 1.12762
    comlang_off | .670514 .1545234 4.34 0.000 .3676537 .9733744
    _cons | 17.56397 1.064159 16.51 0.000 15.47825 19.64968
    ------------------------------------------------------------------------------

    Absorbed degrees of freedom:
    -----------------------------------------------------+
    Absorbed FE | Categories - Redundant = Num. Coefs |
    -------------+---------------------------------------|
    exp_year | 306 0 306 |
    imp_year | 308 6 302 |

    Question: in the regression 1(with country pair_id) variable comlang_off (dummy = 1 if same language) is being omitted due to collinearity. I am not sure why. It looks like the country pair fixed effects is absorbing the comlang_off fixed effects. If so, can I go for regression 2 without country pair (pair_id) fixed effects without any concern of endogeneity?

    Thoughts and comments are welcome.
    Thank you in advance

  • #2
    Dear Pericles Sa Nogueira,

    Indeed, in model 1 the pair fixed effects will absorb all characteristics of the pair that do not vary over time. So, if there are no changes to the languages, that variable (like distance) will drop out. In the second regression, the effect of FTAs is likely to be inflated by endogeneity, and this will affect the other estimates. Therefore, I suggest estimating model 2 without the FTA variable to see if the coefficient of interest changes much. If not, then you should be OK.

    Um abraço,

    Joao

    Comment


    • #3
      Dear Professor Joao Santos Silva ,
      Muito Obrigado.

      You reply is much appreciated.

      I've run the the model 2 without the FTA variable and the coefficient for common language didn't change much (before = .670514 vs now= .6927803). I believe, there is no major concern of endogeneity.

      I am sending below the result of the regression using model 2:

      . ppmlhdfe tradeflow_baci ln_dist contig comlang_off, a(exp_year imp_year) cluster (pair_id) nolog
      (dropped 2 observations that are either singletons or separated by a fixed effect)
      warning: dependent variable takes very low values after standardizing (5.2246e-09)
      Converged in 11 iterations and 49 HDFE sub-iterations (tol = 1.0e-08)

      HDFE PPML regression No. of obs = 9,332
      Absorbing 2 HDFE groups Residual df = 2,328
      Statistics robust to heteroskedasticity Wald chi2(3) = 683.35
      Deviance = 258104647.5 Prob > chi2 = 0.0000
      Log pseudolikelihood = -129088290.3 Pseudo R2 = 0.8426

      Number of clusters (pair_id)= 2,329
      (Std. err. adjusted for 2,329 clusters in pair_id)
      ------------------------------------------------------------------------------
      | Robust
      tradeflow_~i | Coefficient std. err. z P>|z| [95% conf. interval]
      -------------+----------------------------------------------------------------
      ln_dist | -1.211516 .1389983 -8.72 0.000 -1.483948 -.9390842
      contig | .6477741 .2003857 3.23 0.001 .2550253 1.040523
      comlang_off | .6927803 .157021 4.41 0.000 .3850249 1.000536
      _cons | 20.42258 1.125855 18.14 0.000 18.21594 22.62921

      Thank you
      Péricles

      Comment


      • #4
        Dear Pericles Sa Nogueira,

        That is reassuring. Note, however, that the other coefficients change a lot, so do not read too much into them.

        Best wishes,

        Joao

        Comment


        • #5
          In addition to what prof Silva suggested, I think while retaining FTA and pair fixed effects, you can go for the second-stage estimation. After predicting the pair-fixed effects in the first stage, we can try to establish whether language is related to it.

          First Stage Regression:

          X_(ij,t) = exp(μ_(i,t) + π_(j,t) + δ_ij + β_z * Z_(ij,t) + ϵ_(ij,t))

          Where:
          - X_(ij,t) is the trade flow between exporter i and importer j at time t
          - μ_(i,t) represents exporter-time fixed effects
          - π_(j,t) represents importer-time fixed effects
          - δ_ij captures country-pair fixed effects
          - Z_(ij,t) is a vector of trade policy variables (such as FTA or WTO membership)

          Second Stage Regression:

          δ̂_ij = exp(ν_i + ζ_j + β_"Lang" * Lang_ij + β_k * K_ij + ϵ_ij)

          Where:
          - δ̂_ij is the estimated pair fixed effect
          - ν_i represents exporter-specific fixed effects
          - ζ_j represents importer-specific fixed effects
          - Lang_ij represents the language variable, which could measure linguistic proximity or a shared language between countries i and j
          - K_ij represents other time-invariant factors influencing trade between countries i and j

          Code:
          use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear
          egen imp = group(isoimp)
          egen exp = group(isoexp)
          ppmlhdfe trade fta, a(imp#year exp#year imp#exp, save) cluster(imp#exp)
          rename __hdfe3__ pair_fixed
          reghdfe pair_fixed comlang_off, a(imp#year exp#year) cluster(imp#exp)

          Comment


          • #6
            Tariq Masood thank you for your very useful suggestion.

            Just to make sure I understood your point: would you mean that by running the code below I will be able to determine if the pair fixed effects (__hdfe3__) generated by the system is absorbing the common_language effect?

            If my understanding is correct in the example you shared common language appears to be significant. Thus it is related to pair-fixed effects. Right?


            Code:
             
             use "http://fmwww.bc.edu/RePEc/bocode/e/EXAMPLE_TRADE_FTA_DATA" if category=="TOTAL", clear egen imp = group(isoimp) egen exp = group(isoexp) ppmlhdfe trade fta, a(imp#year exp#year imp#exp, save) cluster(imp#exp) rename __hdfe3__ pair_fixed reghdfe pair_fixed comlang_off, a(imp#year exp#year) cluster(imp#exp)
            Output:

            (MWFE estimator converged in 3 iterations)

            HDFE Linear regression Number of obs = 5,950
            Absorbing 2 HDFE groups F( 1, 1189) = 59.20
            Statistics robust to heteroskedasticity Prob > F = 0.0000
            R-squared = 0.4922
            Adj R-squared = 0.4610
            Within R-sq. = 0.0447
            Number of clusters (imp#exp) = 1,190 Root MSE = 0.8714

            (Std. err. adjusted for 1,190 clusters in imp#exp)
            ------------------------------------------------------------------------------
            | Robust
            pair_fixed | Coefficient std. err. t P>|t| [95% conf. interval]
            -------------+----------------------------------------------------------------
            comlang_off | .7275164 .0945506 7.69 0.000 .5420119 .913021
            _cons | -1.059597 .0267121 -39.67 0.000 -1.112005 -1.007189

            Thanks

            Comment


            • #7
              This is one of many strategies used by researchers to get a sense of absorbed variables. Predicted pair fixed effects are part of the dependent variable, so in the second stage, you are indirectly testing where language variables determine trade flows. As pair-fixed effects included are always time-invariant, one approach could be to use some time-varying measure (like linguistic similarity), if available, that will not be absorbed by the pair-fixed effects.

              Comment

              Working...
              X