Dear all,
I am estimating a gravity model with PPML (Poisson Pseudo-Maximum Likelihood estimator) in order to account for zero trade values. The data set includes bilateral trade between a reference country and 58 partner countries for a single year. My dependent variable (trade) is scaled into thousands of dollars, and is left in levels. Explanatory variables include gdp (scaled into thousands and natural logged), distance (natural logged), and dummies for common language and contiguity. I also include an indexed policy variable (an index of GMO regulations, my variable of interest) ranging from 0 to 5.
My current model thus looks like:
xi: ppml trade ln(gdp) ln(distance) i.contiguity i.common_language gmoindex
When I run this in Stata, my output looks like:
xi: ppml trade gdp dist i.contig i.comlang gmoindex
i.contig _Icontig_0-1 (naturally coded; _Icontig_0 omitted)
i.comlang _Icomlang_0-1 (naturally coded; _Icomlang_0 omitted)
note: checking the existence of the estimates
WARNING: trade has very large values, consider rescaling
WARNING: gdp has very large values, consider rescaling or recentering
Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0
note: starting ppml estimation
note: trade has noninteger values
Iteration 1: deviance = 1.35e+07
Iteration 2: deviance = 7885809
Iteration 3: deviance = 5257384
Iteration 4: deviance = 4276665
Iteration 5: deviance = 4098061
Iteration 6: deviance = 4089311
Iteration 7: deviance = 4089280
Iteration 8: deviance = 4089280
Iteration 9: deviance = 4089280
Number of parameters: 6
Number of observations: 58
Pseudo log-likelihood: -2044797.2
R-squared: .98220469
Option strict is: off
WARNING: The model appears to overfit some observations with trade=0
-------------------------------------------------------------------------------
| Semirobust
soyaArg2008 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
gdp | 2.28884 .4709613 4.86 0.000 1.365773 3.211908
dist | 9.712658 1.676844 5.79 0.000 6.426104 12.99921
_Icontig_1 | 19.45055 4.171823 4.66 0.000 11.27393 27.62717
_Icomlang_1 | 5.550667 1.874701 2.96 0.003 1.87632 9.225013
gmoindex | -10.93835 3.844079 -2.85 0.004 -18.4726 -3.404092
_cons | -126.2497 23.13639 -5.46 0.000 -171.5962 -80.90324
-------------------------------------------------------------------------------
I have two concerns about my output.
Best regards,
Erik
I am estimating a gravity model with PPML (Poisson Pseudo-Maximum Likelihood estimator) in order to account for zero trade values. The data set includes bilateral trade between a reference country and 58 partner countries for a single year. My dependent variable (trade) is scaled into thousands of dollars, and is left in levels. Explanatory variables include gdp (scaled into thousands and natural logged), distance (natural logged), and dummies for common language and contiguity. I also include an indexed policy variable (an index of GMO regulations, my variable of interest) ranging from 0 to 5.
My current model thus looks like:
xi: ppml trade ln(gdp) ln(distance) i.contiguity i.common_language gmoindex
When I run this in Stata, my output looks like:
xi: ppml trade gdp dist i.contig i.comlang gmoindex
i.contig _Icontig_0-1 (naturally coded; _Icontig_0 omitted)
i.comlang _Icomlang_0-1 (naturally coded; _Icomlang_0 omitted)
note: checking the existence of the estimates
WARNING: trade has very large values, consider rescaling
WARNING: gdp has very large values, consider rescaling or recentering
Number of regressors excluded to ensure that the estimates exist: 0
Number of observations excluded: 0
note: starting ppml estimation
note: trade has noninteger values
Iteration 1: deviance = 1.35e+07
Iteration 2: deviance = 7885809
Iteration 3: deviance = 5257384
Iteration 4: deviance = 4276665
Iteration 5: deviance = 4098061
Iteration 6: deviance = 4089311
Iteration 7: deviance = 4089280
Iteration 8: deviance = 4089280
Iteration 9: deviance = 4089280
Number of parameters: 6
Number of observations: 58
Pseudo log-likelihood: -2044797.2
R-squared: .98220469
Option strict is: off
WARNING: The model appears to overfit some observations with trade=0
-------------------------------------------------------------------------------
| Semirobust
soyaArg2008 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------+----------------------------------------------------------------
gdp | 2.28884 .4709613 4.86 0.000 1.365773 3.211908
dist | 9.712658 1.676844 5.79 0.000 6.426104 12.99921
_Icontig_1 | 19.45055 4.171823 4.66 0.000 11.27393 27.62717
_Icomlang_1 | 5.550667 1.874701 2.96 0.003 1.87632 9.225013
gmoindex | -10.93835 3.844079 -2.85 0.004 -18.4726 -3.404092
_cons | -126.2497 23.13639 -5.46 0.000 -171.5962 -80.90324
-------------------------------------------------------------------------------
I have two concerns about my output.
- I am concerned about controlling for heteroscedasticity, and thus want robust standard errors. However, various attempts have only produced “semirobust standard errors” for me. Using the ,robust option does not work with ppml. After glancing through other posts, it appears that clustering may resolve this problem? However, I don’t understand what type of clusters I should use or what variables to cluster. Would this give robust std. errors, or is there another way to get robust results?
- The above output gives the warning that the model “appears to overfit some observations with trade=0.” I believe this problem has to do with defining/omitting dummy variables (based on the Statalist post: http://www.statalist.org/forums/foru...ariance-matrix). I tried using xi, noomit: ppml [model], but the error did not go away. I also tried dropping the i. prefix from my dummies (which I already created manually in Excel), but this didn’t remove the warning either.
Best regards,
Erik
Comment