Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PPML for gravity model estimation, question on CLUSTERING and ROBUST STANDARD ERRORS

    Dear all,

    I am estimating a gravity model with PPML (Poisson Pseudo-Maximum Likelihood estimator) in order to account for zero trade values. The data set includes bilateral trade between a reference country and 58 partner countries for a single year. My dependent variable (trade) is scaled into thousands of dollars, and is left in levels. Explanatory variables include gdp (scaled into thousands and natural logged), distance (natural logged), and dummies for common language and contiguity. I also include an indexed policy variable (an index of GMO regulations, my variable of interest) ranging from 0 to 5.

    My current model thus looks like:
    xi: ppml trade ln(gdp) ln(distance) i.contiguity i.common_language gmoindex

    When I run this in Stata, my output looks like:

    xi: ppml trade gdp dist i.contig i.comlang gmoindex
    i.contig _Icontig_0-1 (naturally coded; _Icontig_0 omitted)
    i.comlang _Icomlang_0-1 (naturally coded; _Icomlang_0 omitted)

    note: checking the existence of the estimates
    WARNING: trade has very large values, consider rescaling
    WARNING: gdp has very large values, consider rescaling or recentering

    Number of regressors excluded to ensure that the estimates exist: 0
    Number of observations excluded: 0

    note: starting ppml estimation
    note: trade has noninteger values

    Iteration 1: deviance = 1.35e+07
    Iteration 2: deviance = 7885809
    Iteration 3: deviance = 5257384
    Iteration 4: deviance = 4276665
    Iteration 5: deviance = 4098061
    Iteration 6: deviance = 4089311
    Iteration 7: deviance = 4089280
    Iteration 8: deviance = 4089280
    Iteration 9: deviance = 4089280

    Number of parameters: 6
    Number of observations: 58
    Pseudo log-likelihood: -2044797.2
    R-squared: .98220469
    Option strict is: off
    WARNING: The model appears to overfit some observations with trade=0
    -------------------------------------------------------------------------------
    | Semirobust
    soyaArg2008 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    --------------+----------------------------------------------------------------
    gdp | 2.28884 .4709613 4.86 0.000 1.365773 3.211908
    dist | 9.712658 1.676844 5.79 0.000 6.426104 12.99921
    _Icontig_1 | 19.45055 4.171823 4.66 0.000 11.27393 27.62717
    _Icomlang_1 | 5.550667 1.874701 2.96 0.003 1.87632 9.225013
    gmoindex | -10.93835 3.844079 -2.85 0.004 -18.4726 -3.404092
    _cons | -126.2497 23.13639 -5.46 0.000 -171.5962 -80.90324
    -------------------------------------------------------------------------------


    I have two concerns about my output.
    1. I am concerned about controlling for heteroscedasticity, and thus want robust standard errors. However, various attempts have only produced “semirobust standard errors” for me. Using the ,robust option does not work with ppml. After glancing through other posts, it appears that clustering may resolve this problem? However, I don’t understand what type of clusters I should use or what variables to cluster. Would this give robust std. errors, or is there another way to get robust results?
    2. The above output gives the warning that the model “appears to overfit some observations with trade=0.” I believe this problem has to do with defining/omitting dummy variables (based on the Statalist post: http://www.statalist.org/forums/foru...ariance-matrix). I tried using xi, noomit: ppml [model], but the error did not go away. I also tried dropping the i. prefix from my dummies (which I already created manually in Excel), but this didn’t remove the warning either.
    I appreciate your help, and I apologize that much of this content is new for me, so some of my problems may be quite naïve.

    Best regards,
    Erik

  • #2
    Erik,

    The standard errors are robust, "semi-robust" is just a label used by Stata. Having said that, it is customary to cluster by country pair.

    Looking at your estimation results it looks as if you are indeed estimating a model where some coefficients are not identified. Please estimate the model with dummies for all categories and drop the constant.

    Finally, notice that your sample is very small.

    Best wishes,

    Joao

    Comment


    • #3
      Hi Joao,

      Thank you so much for your response!

      Following your suggestion, I re-estimated my model without a constant term. This, however, appears to have a large effect on the signs and magnitude of my results, and I am trying to understand what exactly including/excluding the constant is doing, and which of my results I should believe.
      When I estimate my model with a constant term, I get the following output (which includes the warning that the model appears to overfit some observations):

      . xi: ppml soyaArg2008 gdp2008 distArg i.contigArg i.comlangArg gmoindex

      i.contigArg _IcontigArg_0-1 (naturally coded; _IcontigArg_0 omitted)
      i.comlangArg _IcomlangAr_0-1 (naturally coded; _IcomlangAr_0 omitted)

      note: checking the existence of the estimates
      WARNING: soyaArg2008 has very large values, consider rescaling
      WARNING: gdp2008 has very large values, consider rescaling or recentering

      Number of regressors excluded to ensure that the estimates exist: 0
      Number of observations excluded: 0

      note: starting ppml estimation
      note: soyaArg2008 has noninteger values

      Iteration 1: deviance = 1.35e+07
      Iteration 2: deviance = 7885809
      Iteration 3: deviance = 5257384
      Iteration 4: deviance = 4276665
      Iteration 5: deviance = 4098061
      Iteration 6: deviance = 4089311
      Iteration 7: deviance = 4089280
      Iteration 8: deviance = 4089280
      Iteration 9: deviance = 4089280

      Number of parameters: 6
      Number of observations: 58
      Pseudo log-likelihood: -2044797.2
      R-squared: .98220469
      Option strict is: off
      WARNING: The model appears to overfit some observations with soyaArg2008=0
      -------------------------------------------------------------------------------
      | Semirobust
      soyaArg2008 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      --------------+----------------------------------------------------------------
      gdp2008 | 2.28884 .4709613 4.86 0.000 1.365773 3.211908
      distArg | 9.712658 1.676844 5.79 0.000 6.426104 12.99921
      _IcontigArg_1 | 19.45055 4.171823 4.66 0.000 11.27393 27.62717
      _IcomlangAr_1 | 5.550667 1.874701 2.96 0.003 1.87632 9.225013
      gmoindex | -10.93835 3.844079 -2.85 0.004 -18.4726 -3.404092
      _cons | -126.2497 23.13639 -5.46 0.000 -171.5962 -80.90324
      -------------------------------------------------------------------------------


      However, when I estimate it without a constant term, I get this output, which is significantly different in sign and magnitude:

      xi: ppml soyaArg2008 gdp2008 distArg i.contigArg i.comlangArg gmoindex, noconstant

      i.contigArg _IcontigArg_0-1 (naturally coded; _IcontigArg_0 omitted)
      i.comlangArg _IcomlangAr_0-1 (naturally coded; _IcomlangAr_0 omitted)

      note: checking the existence of the estimates
      WARNING: soyaArg2008 has very large values, consider rescaling
      WARNING: gdp2008 has very large values, consider rescaling or recentering

      Number of regressors excluded to ensure that the estimates exist: 0
      Number of observations excluded: 0

      note: starting ppml estimation
      note: soyaArg2008 has noninteger values

      Iteration 1: deviance = 4.22e+07
      Iteration 2: deviance = 2.48e+07
      Iteration 3: deviance = 2.01e+07
      Iteration 4: deviance = 1.92e+07
      Iteration 5: deviance = 1.91e+07
      Iteration 6: deviance = 1.91e+07
      Iteration 7: deviance = 1.91e+07
      Iteration 8: deviance = 1.91e+07

      Number of parameters: 5
      Number of observations: 58
      Pseudo log-likelihood: -9561509
      R-squared: .05665912
      Option strict is: off
      -------------------------------------------------------------------------------
      | Semirobust
      soyaArg2008 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      --------------+----------------------------------------------------------------
      gdp2008 | .6238821 .1778681 3.51 0.000 .2752669 .9724972
      distArg | -.0493533 .3417985 -0.14 0.885 -.7192661 .6205594
      _IcontigArg_1 | -1.382808 1.750803 -0.79 0.430 -4.814319 2.048702
      _IcomlangAr_1 | -2.029582 1.064701 -1.91 0.057 -4.116357 .0571938
      gmoindex | -1.83519 .9883912 -1.86 0.063 -3.772401 .1020208
      -------------------------------------------------------------------------------


      Furthermore, when I define my dummy variables within Stata (I’m not sure what exactly you meant by “estimating the model with dummies for all categories”), I get the following output, which differs for the no-constant regression, and is identical but with flipped signs on dummies for the regression with a constant.


      tabulate contigArg, generate(marg)

      contigArg | Freq. Percent Cum.
      ------------+-----------------------------------
      0 | 166 97.08 97.08
      1 | 5 2.92 100.00
      ------------+-----------------------------------
      Total | 171 100.00

      . tabulate comlangArg, generate(narg)

      comlangArg | Freq. Percent Cum.
      ------------+-----------------------------------
      0 | 151 88.30 88.30
      1 | 20 11.70 100.00
      ------------+-----------------------------------
      Total | 171 100.00


      . xi: ppml soyaArg2008 gdp2008 distArg marg1 narg1 gmoindex, noconstant

      note: checking the existence of the estimates
      WARNING: soyaArg2008 has very large values, consider rescaling
      WARNING: gdp2008 has very large values, consider rescaling or recentering

      Number of regressors excluded to ensure that the estimates exist: 0
      Number of observations excluded: 0

      note: starting ppml estimation
      note: soyaArg2008 has noninteger values

      Iteration 1: deviance = 4.02e+07
      Iteration 2: deviance = 2.46e+07
      Iteration 3: deviance = 2.05e+07
      Iteration 4: deviance = 1.97e+07
      Iteration 5: deviance = 1.96e+07
      Iteration 6: deviance = 1.96e+07
      Iteration 7: deviance = 1.96e+07
      Iteration 8: deviance = 1.96e+07

      Number of parameters: 5
      Number of observations: 58
      Pseudo log-likelihood: -9824457.9
      R-squared: .07352513
      Option strict is: off
      ------------------------------------------------------------------------------
      | Semirobust
      soyaArg2008 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      gdp2008 | .5483715 .1512694 3.63 0.000 .251889 .844854
      distArg | .1305202 .4223417 0.31 0.757 -.6972542 .9582946
      marg1 | -.7637511 1.319131 -0.58 0.563 -3.349201 1.821699
      narg1 | .5504681 .5499231 1.00 0.317 -.5273615 1.628298
      gmoindex | -1.809164 .8919584 -2.03 0.043 -3.557371 -.0609579
      ------------------------------------------------------------------------------


      . xi: ppml soyaArg2008 gdp2008 distArg marg1 narg1 gmoindex

      note: checking the existence of the estimates
      WARNING: soyaArg2008 has very large values, consider rescaling
      WARNING: gdp2008 has very large values, consider rescaling or recentering

      Number of regressors excluded to ensure that the estimates exist: 0
      Number of observations excluded: 0

      note: starting ppml estimation
      note: soyaArg2008 has noninteger values

      Iteration 1: deviance = 1.35e+07
      Iteration 2: deviance = 7885809
      Iteration 3: deviance = 5257384
      Iteration 4: deviance = 4276665
      Iteration 5: deviance = 4098061
      Iteration 6: deviance = 4089311
      Iteration 7: deviance = 4089280
      Iteration 8: deviance = 4089280
      Iteration 9: deviance = 4089280

      Number of parameters: 6
      Number of observations: 58
      Pseudo log-likelihood: -2044797.2
      R-squared: .98220469
      Option strict is: off
      WARNING: The model appears to overfit some observations with soyaArg2008=0
      ------------------------------------------------------------------------------
      | Semirobust
      soyaArg2008 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      gdp2008 | 2.28884 .4709613 4.86 0.000 1.365773 3.211908
      distArg | 9.712658 1.676844 5.79 0.000 6.426104 12.99921
      marg1 | -19.45055 4.171823 -4.66 0.000 -27.62717 -11.27393
      narg1 | -5.550667 1.874701 -2.96 0.003 -9.225013 -1.87632
      gmoindex | -10.93835 3.844079 -2.85 0.004 -18.4726 -3.404092
      _cons | -101.2485 17.84926 -5.67 0.000 -136.2324 -66.26462
      ------------------------------------------------------------------------------


      Thus, I am rather confused about the effect of the constant and definition of dummy variables on my result. I apologize that this post is so long and non-technical, but I’m struggling with what to do about such wide variation in my results.

      Again, thank you for any response.

      Best regards,

      ​Erik

      Comment


      • #4
        Dear Erik,

        I can see you are confused ;-)

        What you have to do is to estimate the model with dummies for all categories and no constant. For example, if one of your dummies was for gender you should include both the dummy for males and the dummy for females, rather than excluding one of them and therefore defining the base category.

        So, generate all the dummies you need (without excluding the base category), estimate the model without constant but including all the dummies (do not use the xi prefix). The results you get should the the ones you want. Please post them here so that we can compare them with the others you got, OK?

        Best wishes,

        Joao

        Comment


        • #5
          Dear Joao,

          Thank you again for your response. As you suggested, I re-estimated my model with all dummies and without a constant term. I’ve posted an example of the results below (1). These results look better than what I was getting before! And I actually got identical results by again avoiding the xi prefix, but using only the default constants and including a constant (2, below).

          I do still have one concern. I previously estimated the same model (with different variables) using OLS and Tobit estimators, and each of these estimators produced results comparable to the other (3 and 4, below). Now that I estimate the model with PPML, however, my results change slightly (5 and 6, below). Specifically, my focus variable (in this case, called Labeling) goes from being insignificant under OLS and Tobit to significantly positive under PPML…

          Question: Is this simply because of PPML’s alternative way of estimating the model, or is there still something going wrong in my PPML code?

          Alternatively, could it be that the warnings of “dep. var. has very large values, consider rescaling” and “gdp has very large values, consider rescaling or recentering” are highlighting the existence of some outlier that alters the PPML results? There are a handful of trade values and gdp values in my data (China, for instance) that could be considered outliers. Note, though, that I have already scaled all dollar values into thousands of dollars.

          I look forward to your thoughts! And sincerely, thank you for your patience.

          Best regards,
          Erik


          1. SUGGESTED METHOD, WITH ALL DUMMY VARIABLES INCLUDED, NO CONSTANT:
          ppml soyaArg gdp distArg contigArgentina2 contigArgentina1 comlangArgentina2 comlangArgentina1 gmoindex, noconstant

          note: checking the existence of the estimates
          WARNING: soyaArg has very large values, consider rescaling
          WARNING: gdp has very large values, consider rescaling or recentering

          Number of regressors excluded to ensure that the estimates exist: 0
          Number of observations excluded: 0

          note: comlangArgentina1 omitted because of collinearity

          note: starting ppml estimation
          note: soyaArg has noninteger values

          Iteration 1: deviance = 3.03e+07
          Iteration 2: deviance = 1.85e+07
          Iteration 3: deviance = 1.41e+07
          Iteration 4: deviance = 1.28e+07
          Iteration 5: deviance = 1.27e+07
          Iteration 6: deviance = 1.27e+07
          Iteration 7: deviance = 1.27e+07
          Iteration 8: deviance = 1.27e+07
          Iteration 9: deviance = 1.27e+07

          Number of parameters: 6
          Number of observations: 174
          Pseudo log-likelihood: -6330420.1
          R-squared: .71772131
          Option strict is: off
          -----------------------------------------------------------------------------------
          | Semirobust
          defsoyaArg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          ------------------+----------------------------------------------------------------
          gdp | 1.827792 .2068734 8.84 0.000 1.422327 2.233256
          distArg | 7.212482 .9982246 7.23 0.000 5.255998 9.168967
          contigArgentina2 | -79.94674 10.18034 -7.85 0.000 -99.89984 -59.99364
          contigArgentina1 | -93.93674 12.19166 -7.70 0.000 -117.832 -70.04153
          comlangArgentina2 | 4.975368 .8643526 5.76 0.000 3.281268 6.669468
          gmoindex | -8.602154 1.816455 -4.74 0.000 -12.16234 -5.041968
          -----------------------------------------------------------------------------------


          2. ALTERNATIVE METHOD WITH DEFAULT VARIABLES AND CONSTANT (SAME RESULTS)
          ppml soyaArg gdp distArg contigArg comlangArg gmoindex

          note: checking the existence of the estimates
          WARNING: soyaArg has very large values, consider rescaling
          WARNING: gdp has very large values, consider rescaling or recentering

          Number of regressors excluded to ensure that the estimates exist: 0
          Number of observations excluded: 0

          note: starting ppml estimation
          note: soyaArg has noninteger values

          Iteration 1: deviance = 3.03e+07
          Iteration 2: deviance = 1.85e+07
          Iteration 3: deviance = 1.41e+07
          Iteration 4: deviance = 1.28e+07
          Iteration 5: deviance = 1.27e+07
          Iteration 6: deviance = 1.27e+07
          Iteration 7: deviance = 1.27e+07
          Iteration 8: deviance = 1.27e+07
          Iteration 9: deviance = 1.27e+07

          Number of parameters: 6
          Number of observations: 174
          Pseudo log-likelihood: -6330420.1
          R-squared: .71772131
          Option strict is: off
          ------------------------------------------------------------------------------
          | Semirobust
          soyaArg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          gdp | 1.827792 .2068734 8.84 0.000 1.422327 2.233256
          distArg | 7.212482 .9982246 7.23 0.000 5.255998 9.168967
          contigArg | 13.99 2.315775 6.04 0.000 9.451169 18.52884
          comlangArg | 4.975368 .8643526 5.76 0.000 3.281268 6.669468
          gmoindex | -8.602154 1.816455 -4.74 0.000 -12.16234 -5.041968
          _cons | -93.93674 12.19166 -7.70 0.000 -117.832 -70.04153
          ------------------------------------------------------------------------------


          3. OLS RESULTS (THIS TIME OF A DIFFERENT MODEL WITH FOCUS VARIABLE “LABELING”)
          regress soyaBra gdp distBra i.contigBra i.comlangBra Labeling, robust

          Linear regression Number of obs = 174
          F( 5, 168) = 70.19
          Prob > F = 0.0000
          R-squared = 0.2891
          Root MSE = 4.773

          ------------------------------------------------------------------------------
          | Robust
          soyaBra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          gdp | 1.759846 .231146 7.61 0.000 1.303521 2.216171
          distBra | 1.510898 1.1257 1.34 0.181 -.7114426 3.733239
          1.contigBra | -.1928695 1.814515 -0.11 0.915 -3.775057 3.389318
          1.comlangBra | 7.658716 .6122588 12.51 0.000 6.450003 8.867428
          Labeling | .162902 1.164298 0.14 0.889 -2.135637 2.461441
          _cons | -43.44134 9.80927 -4.43 0.000 -62.80666 -24.07603
          ------------------------------------------------------------------------------


          4. TOBIT RESULTStobit soyaBra gdp distBra i.contigBra i.comlangBra Labeling, ll(0)

          Tobit regression Number of obs = 174
          LR chi2(5) = 58.39
          Prob > chi2 = 0.0000
          Log likelihood = -369.79639 Pseudo R2 = 0.0732

          ------------------------------------------------------------------------------
          soyaBra | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          gdp | 3.390641 .5126487 6.61 0.000 2.378621 4.402661
          distBra | 3.839523 2.170289 1.77 0.079 -.444846 8.123893
          1.contigBra | 2.505788 3.843805 0.65 0.515 -5.082269 10.09385
          1.comlangBra | 12.74682 4.627751 2.75 0.007 3.611175 21.88247
          Labeling | -1.256573 2.427927 -0.52 0.605 -6.049544 3.536398
          _cons | -99.65792 22.31814 -4.47 0.000 -143.7162 -55.59968
          -------------+----------------------------------------------------------------
          /sigma | 7.707858 .642088 6.440312 8.975405
          ------------------------------------------------------------------------------
          Obs. summary: 83 left-censored observations at soyaBra<=0
          91 uncensored observations
          0 right-censored observations

          5. PPML RESULTS
          *Creating dummy variables*
          .
          . tabulate contigBra, generate(contigBrazil)

          contigBra | Freq. Percent Cum.
          ------------+-----------------------------------
          0 | 162 93.10 93.10
          1 | 12 6.90 100.00
          ------------+-----------------------------------
          Total | 174 100.00


          . tabulate comlangBra, generate(comlangBrazil)

          comlangBra | Freq. Percent Cum.
          ------------+-----------------------------------
          0 | 171 98.28 98.28
          1 | 3 1.72 100.00
          ------------+-----------------------------------
          Total | 174 100.00


          ppml soyaBra gdp distBra contigBrazil2 contigBrazil1 comlangBrazil2 comlangBrazil1 Labeling, noconstant

          note: checking the existence of the estimates
          WARNING: soyaBra has very large values, consider rescaling
          WARNING: gdp has very large values, consider rescaling or recentering

          Number of regressors excluded to ensure that the estimates exist: 1
          Excluded regressors: comlangBrazil1
          Number of observations excluded: 0

          note: starting ppml estimation
          note: soyaBra has noninteger values

          Iteration 1: deviance = 8.89e+07
          Iteration 2: deviance = 6.12e+07
          Iteration 3: deviance = 5.38e+07
          Iteration 4: deviance = 5.27e+07
          Iteration 5: deviance = 5.27e+07
          Iteration 6: deviance = 5.27e+07
          Iteration 7: deviance = 5.27e+07
          Iteration 8: deviance = 5.27e+07
          Iteration 9: deviance = 5.27e+07

          Number of parameters: 6
          Number of observations: 174
          Pseudo log-likelihood: -26332907
          R-squared: .87513519
          Option strict is: off
          --------------------------------------------------------------------------------
          | Semirobust
          soyaBra | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
          gdp | 1.280747 .1877322 6.82 0.000 .9127989 1.648695
          distBra | 2.041035 .6487462 3.15 0.002 .7695162 3.312554
          contigBrazil2 | -36.75786 5.016741 -7.33 0.000 -46.59049 -26.92523
          contigBrazil1 | -37.78452 5.725895 -6.60 0.000 -49.00707 -26.56197
          comlangBrazil2 | 2.331063 .5565331 4.19 0.000 1.240278 3.421847
          Labeling | 6.048552 1.329141 4.55 0.000 3.443483 8.65362
          --------------------------------------------------------------------------------



          6. ALTERNATIVE PPML MODEL
          ppml soyaBra gdp distBra contigBra comlangBra Labeling

          note: checking the existence of the estimates
          WARNING: soyaBra has very large values, consider rescaling
          WARNING: gdp has very large values, consider rescaling or recentering

          Number of regressors excluded to ensure that the estimates exist: 0
          Number of observations excluded: 0

          note: starting ppml estimation
          note: soyaBra has noninteger values

          Iteration 1: deviance = 8.89e+07
          Iteration 2: deviance = 6.12e+07
          Iteration 3: deviance = 5.38e+07
          Iteration 4: deviance = 5.27e+07
          Iteration 5: deviance = 5.27e+07
          Iteration 6: deviance = 5.27e+07
          Iteration 7: deviance = 5.27e+07
          Iteration 8: deviance = 5.27e+07
          Iteration 9: deviance = 5.27e+07

          Number of parameters: 6
          Number of observations: 174
          Pseudo log-likelihood: -26332907
          R-squared: .87513519
          Option strict is: off
          ------------------------------------------------------------------------------
          | Semirobust
          soyaBra | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          gdp | 1.280747 .1877322 6.82 0.000 .9127989 1.648695
          distBra | 2.041035 .6487462 3.15 0.002 .7695162 3.312554
          contigBra | 1.026663 1.058568 0.97 0.332 -1.048092 3.101418
          comlangBra | 2.331063 .5565331 4.19 0.000 1.240278 3.421847
          Labeling | 6.048552 1.329141 4.55 0.000 3.443483 8.65362
          _cons | -37.78452 5.725895 -6.60 0.000 -49.00707 -26.56197
          ------------------------------------------------------------------------------



          Comment


          • #6
            Dear Erik,

            It is not surprising that PPML leads to different results; that is why it is important to use it ;-)

            Anyway, I note that when you estimate by OLS and by the Tobit, you are not using the dependent variable in logs. So, the models estimated by OLS and Tobit you have additive effects, but in the Poisson regression you have multiplicative effects. That makes the models difficult to compare.

            About the warnings, that is no indication of the presence of outliers. They are there because Stata sometimes has trouble handling variables with very large values. If you rescale everything in millions of dollars you may find that the convergence is quicker, but the main results should not change.

            All the best,

            Joao

            Comment


            • #7
              Originally posted by Joao Santos Silva View Post
              Erik,

              The standard errors are robust, "semi-robust" is just a label used by Stata. Having said that, it is customary to cluster by country pair.

              Looking at your estimation results it looks as if you are indeed estimating a model where some coefficients are not identified. Please estimate the model with dummies for all categories and drop the constant.

              Finally, notice that your sample is very small.

              Best wishes,

              Joao
              Dear Joao,

              I just want to include robust standard error as well in a PPML regression, do you suggest in your message that it is an option already included in the PPML command?

              Thank you,
              Killian
              Last edited by Killian Foubert; 06 May 2017, 06:04.

              Comment


              • #8
                Indeed, by default -ppml- gives robust standard errors (but not clustered); if you want to cluster, please use the appropriate option.

                Best wishes,

                Joao

                Comment


                • #9
                  Dear Joao Santos Silva,

                  I'm doing a ppmlhdfe regression on FDI data on a country-pair-sector level. Can you tell me whether it would then make sense to cluster my standard errors on country-pair-sector-level or on country pair level? Do I have to do any test to find out which would be appropriate? Thank you very much in advance!

                  Best wishes
                  Noemi

                  Comment


                  • #10
                    Dear Noemi Seng,

                    If you have enough pairs, I would cluster at the country-pair level to account for the fact that sectors in the same pair may be correlated.

                    Best wishes,

                    Joao

                    Comment


                    • #11
                      Dear Joao Santos Silva,

                      thank you so much for your quick reply. When clustering on country-pair level, I have 4,079 clusters (271,156 observations); when clustering on country-pair-sector level, I have 25,188 clusters. Would you say those 4,079 clusters point to having enough pairs? codebook country_pair reveals that I have 4,192 country pairs (I'm not quite sure why this results in only 4,079 and not 4,192 clusters).

                      I appreciate your answer so much.

                      Best
                      Noemi

                      Comment


                      • #12
                        Yes, that should be fine.

                        Comment


                        • #13
                          Dear Joao Santos Silva

                          thank you very much. Would you in a country-pair-year level of aggregation cluster the standard errors at the country-pair-level as well? (I'm doing a robustness check where I aggregate the FDI data over sectors per country pair). In general, we cluster the standard errors to account for groupwise heteroskedasticity right? But is there a test for this kind of heteroskedasticity that works with ppmlhdfe? I'm only aware of the xttest3 (modified Wald test) command for panel data.

                          I really appreciate your advice.

                          Best,
                          Noemi

                          Comment


                          • #14
                            Dear Noemi Seng,

                            I would cluster at country-pair level, not county-pair-year nor country-pair-sector. We cluster to account for serial correlation, not heteroskedasticity.

                            Best wishes,

                            Joao

                            Comment


                            • #15
                              Dear Joao Santos Silva

                              thank you for the clarification. May I also ask you: in the country-pair-sector-level regression with standard errors clustered at country-pair-level, which fixed effects would you include? I included time-FE, source country and host country FE, no source*time, host*time FE as they would remove too much variation in my data set.

                              I would appreciate your thoughts on that.

                              Best
                              Noemi

                              Comment

                              Working...
                              X