Interpretation of coefficients in gravity model using PPML (dummies)

Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#31

31 Mar 2020, 10:57

Dear Maria Lopez,

Your dependent variable is in levels, but the the gravity model is exponential and that implies that the interpretation is as in a log-log regression. So, for example, the coefficient on log distance is an elasticity.

Best wishes,

Joao
Comment
Maria Lopez

Join Date: Dec 2019

Posts: 10
#32

02 Apr 2020, 00:26

Dear professor Santos,

Thank you so much for your help.

Best,

Maria
Comment
Maria Lopez

Join Date: Dec 2019

Posts: 10
#33

10 Jun 2020, 08:47

Dear professor Santos,

As you know I have been working in a gravity model to explain the trade flows between Nicaragua and its trading partners.

What could be an explanation for negative and significant coefficient for the variable " Adjacency (contiguity)" on trade of agricultural products?.

Do you know any study that showed the same negative sign for this variable?

One of the explanation I gave was:

Lanuza & Bone (2013) suggested this is because the variable contiguity is likely collinear with the variables same country and common language, since Nicaragua share the same border with Costa Rica and Honduras, countries that once were part of the same country called Federal Republic of Central America.

However, my advissor is not so happy with the negative sign.

Thank you so much.

Best,

Maria
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#34

10 Jun 2020, 10:06

Dear Maria Lopez,

First of all, please make sure the coefficient is identified; that is, make sure that you are not including a "fixed effect" that makes that regressor redundant.

If the coefficient is really identified, a possible explanation for it may be that the economies of the adjacent countries are too similar and therefore they trade less than expected because they all produce essentially the same goods and therefore there is no reason to trade.

Best wishes,

Joao
Comment

Long Nguyenn

Join Date: Jun 2020
Posts: 19

#35

10 Jun 2020, 20:33

Dear Professor Joao Santos Silva,

I am working on a small sample of three countries with multilateral trade from 1998 to 2018 to better understand how to perform gravity model on a larger dataset.

Right now, my ppml command

Code:

. ppml lnvol lndist lnGDPimp lnGDPexp exporter_* importer_* year_*

gives the following result:

Code:

note: checking the existence of the estimates

Number of regressors excluded to ensure that the estimates exist: 4
Excluded regressors:  exporter_1 exporter_2 importer_2 year_21
Number of observations excluded: 0

note: starting ppml estimation
note: lnvol has noninteger values

Iteration 1:   deviance =  4.493896
Iteration 2:   deviance =  4.420987
Iteration 3:   deviance =  4.420984
Iteration 4:   deviance =  4.420984

Number of parameters: 27
Number of observations: 126
Pseudo log-likelihood: -260.46453
R-squared: .92118197
Option strict is: off
------------------------------------------------------------------------------
             |               Robust
       lnvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      lndist |  -.1561562   .0309894    -5.04   0.000    -.2168942   -.0954182
    lnGDPimp |    .109938    .041569     2.64   0.008     .0284643    .1914116
    lnGDPexp |   .0555364   .0383281     1.45   0.147    -.0195854    .1306582
  exporter_3 |  -.1191625   .1528654    -0.78   0.436    -.4187731    .1804482
  importer_1 |  -.0974729   .0498772    -1.95   0.051    -.1952305    .0002847
  importer_3 |   .0270576    .201956     0.13   0.893    -.3687689    .4228841
      year_1 |  -.2258452   .0935905    -2.41   0.016    -.4092792   -.0424112
      year_2 |  -.2105067   .0929665    -2.26   0.024    -.3927178   -.0282957
      year_3 |   -.169045   .0887337    -1.91   0.057    -.3429598    .0048697
      year_4 |  -.1605192   .0825138    -1.95   0.052    -.3222432    .0012048
      year_5 |  -.1382407   .0744673    -1.86   0.063     -.284194    .0077125
      year_6 |   -.098291   .0650511    -1.51   0.131    -.2257887    .0292068
      year_7 |  -.0801831    .061846    -1.30   0.195    -.2013991    .0410329
      year_8 |  -.0822553   .0601515    -1.37   0.171    -.2001501    .0356395
      year_9 |  -.0772801   .0564199    -1.37   0.171     -.187861    .0333008
     year_10 |  -.0575561   .0509048    -1.13   0.258    -.1573278    .0422155
     year_11 |  -.0422892   .0481685    -0.88   0.380    -.1366976    .0521193
     year_12 |  -.0486424    .046542    -1.05   0.296    -.1398631    .0425783
     year_13 |  -.0343536   .0433686    -0.79   0.428    -.1193546    .0506473
     year_14 |  -.0248547    .041068    -0.61   0.545    -.1053465    .0556371
     year_15 |  -.0188818    .039707    -0.48   0.634    -.0967062    .0589426
     year_16 |  -.0153306   .0391906    -0.39   0.696    -.0921427    .0614815
     year_17 |  -.0105907   .0387608    -0.27   0.785    -.0865605    .0653791
     year_18 |  -.0085547   .0389603    -0.22   0.826    -.0849155    .0678061
     year_19 |  -.0075817   .0398676    -0.19   0.849    -.0857207    .0705573
     year_20 |  -.0030358   .0400476    -0.08   0.940    -.0815278    .0754561
       _cons |   2.462663   .4698884     5.24   0.000     1.541698    3.383627
------------------------------------------------------------------------------

To my understanding, the ppmlhdfe command

Code:

ppmlhdfe lnvol lndist lnGDPimp lnGDPexp, a(importer exporter year) vce(robust)

should give me the same result. However, my distance variable is omitted in this ppmlhdfe model

Code:

note: 1 variable omitted because of collinearity: lndist
Iteration 1:   deviance = 4.4939e+00  eps = .         iters = 4    tol = 1.0e-04
>   min(eta) =   1.10  P   
Iteration 2:   deviance = 4.4210e+00  eps = 1.65e-02  iters = 4    tol = 1.0e-04
>   min(eta) =   1.09      
Iteration 3:   deviance = 4.4210e+00  eps = 6.85e-07  iters = 3    tol = 1.0e-04
>   min(eta) =   1.09      
Iteration 4:   deviance = 4.4210e+00  eps = 3.93e-15  iters = 2    tol = 1.0e-05
>   min(eta) =   1.09      
Iteration 5:   deviance = 4.4210e+00  eps = 1.04e-15  iters = 2    tol = 1.0e-08
>   min(eta) =   1.09   S O
--------------------------------------------------------------------------------
> ----------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon b
> elow tolerance)
Converged in 5 iterations and 15 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =        126
Absorbing 3 HDFE groups                           Residual df     =         99
                                                  Wald chi2(2)    =       7.52
Deviance             =  4.420984129               Prob > chi2     =     0.0233
Log pseudolikelihood = -260.4645284               Pseudo R2       =     0.0909
------------------------------------------------------------------------------
             |               Robust
       lnvol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      lndist |          0  (omitted)
    lnGDPimp |    .109938    .041569     2.64   0.008     .0284643    .1914116
    lnGDPexp |   .0555364   .0383281     1.45   0.147    -.0195854    .1306582
       _cons |   1.002081   .4973192     2.01   0.044     .0273532    1.976809
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
    importer |         3           0           3     |
    exporter |         3           1           2     |
        year |        21           1          20    ?|
-----------------------------------------------------+
? = number of redundant parameters may be higher

Can you help me figure out what is going wrong with how I construct my command? I understand that distance is a time-invariant variable that is directly correlated with my fixed effects (importer, exporter, or coutry-pair) and therefore is omitted in the fixed-effects model. Therefore, are there any options within ppmlhdfe command that could help me see the coefficient for distance?

Last edited by Long Nguyenn; 10 Jun 2020, 20:42.

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#36

11 Jun 2020, 02:02

Dear Long Nguyenn,

My guess is that, with just 3 countries, distance is collinear with the fixed effects do its coefficient is not identified. If you estimate the model using the ppml command and include the fixed effects immediately after the dependent variable and before distance, it will probably also drop the distance.

Best wishes,

Joao

Last edited by Joao Santos Silva; 11 Jun 2020, 02:04.
Comment
Long Nguyenn

Join Date: Jun 2020

Posts: 19
#37

11 Jun 2020, 13:57

Dear Professor Joao Santos Silva,

Thank you very much for your information.
Comment
Maria Lopez

Join Date: Dec 2019

Posts: 10
#38

14 Jun 2020, 20:11

Dear professor Santos,

Thank you very much for your help.

Best,

Maria
Comment
Long Nguyenn

Join Date: Jun 2020

Posts: 19
#39

15 Jun 2020, 18:42

Dear Joao Santos Silva,

Sorry to bother you again but I ran into some more problems. I would very much appreciate it if you could help me.

I am working with a sample consists of 1 exporter country (Vietnam) and 30 importer countries from 1998 to 2018. My independent variables are exporter's and importer's GDPs, exporter's and importer's trade openness (% of GDP), and real effective exchange rate (data from https://www.bruegel.org/publications...-new-database/.)

My code is as follow:

Code:

ppmlhdfe exportvolume lnExrate lnEXPgdp lnIMPgdp lnEXPopen lnIMPopen, a(importer year) vce(cluster pairid)

Code:

note: 2 variables omitted because of collinearity: lnEXPgdp lnEXPopen HDFE PPML regression No. of obs = 609 Absorbing 2 HDFE groups Residual df = 28 Statistics robust to heteroskedasticity Wald chi2(3) = 9.51 Deviance = 118024.1806 Prob > chi2 = 0.0232 Log pseudolikelihood = -61627.18238 Pseudo R2 = 0.9566 Number of clusters (pairid) = 29 (Std. Err. adjusted for 29 clusters in pairid) ------------------------------------------------------------------------------ | Robust exportvolume | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- lnExrate | -.8901078 .3309642 -2.69 0.007 -1.538786 -.2414299 lnEXPgdp | 0 (omitted) lnIMPgdp | .3200044 .2898905 1.10 0.270 -.2481706 .8881794 lnEXPopen | 0 (omitted) lnIMPopen | .4969847 .3851513 1.29 0.197 -.2578979 1.251867 _cons | 2.171014 4.67038 0.46 0.642 -6.982762 11.32479 ------------------------------------------------------------------------------ Absorbed degrees of freedom: -----------------------------------------------------+ Absorbed FE | Categories - Redundant = Num. Coefs | -------------+---------------------------------------| importer | 29 29 0 *| year | 21 0 21 | -----------------------------------------------------+ * = FE nested within cluster; treated as redundant for DoF computation

My questions are:

1) The variables that are related to my exporter country are omitted. Is there a way for me to show these variables? I understand that this is due to the fixed effects as these variables are the same across country-pair. However, I do want to use it to explain the trade activity of a specific exporter.

2) Can you help me check my real effective exchange rate variable? This is an index number calculated as the cpi-based weighted average of a currency against a basket of currency. I was thinking I could incorporate this variable as the exchange rate does has an impact on trade. I calculate the REER by dividing the REER index number of Vietnam by the REER index number of a partner country. For example, in 2017 the REER for Vietnam is 104 and for Austria is 108, I would calculate the REER of Vietnam/Austria by 104/108 = 0.96.

I suspect the REER to have a positive impact on export volume because of how the depreciation of a currency, which means the REER rises, will result in export being cheaper, thus export will rise as well. However, the coefficient for REER in my model is significantly negative.

3) For those variables that did have coefficients, they are not statistically significant even though I have doubled the sample size. Do you think my sample is still too small to produce a significant result? If not then what seems to be the reason and how may I improve it?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#40

17 Jun 2020, 05:13

Dear Long Nguyenn,

1) There is no way to show those results because they do not make sense. Those variables are being used in the sense that their effect is captured by the fixed effects, but their coefficients are not separately identified.

2) Sorry, I cannot help you here.

3) Yes, that is likely to be the case; the regressors may not have enough variability given all the fixed effects.

Joao
Comment
Long Nguyenn

Join Date: Jun 2020

Posts: 19
#41

17 Jun 2020, 23:06

Dear Professor Joao Santos Silva,

Thank you for your answer. I am following the work of Egger and Pfaffermayr (2003) and in there I think they performed a regression of the individual effects, derived from the fixed effects estimator, on time-invariant variables such as distance, common language, etc. I was wondering if this method is still valid and if yes, then what exactly am I trying to perform a regression here? How do I specify the individual effects?

Last edited by Long Nguyenn; 17 Jun 2020, 23:47.
Comment
Daniel Redekamp

Join Date: Jun 2020

Posts: 2
#42

30 Jun 2020, 03:09

Dear Professor Joao Santos Silva

I am new to Stata and building a gravity model for student migration movements.

I want to use the PPML estimator, but I am unsure about the interpretation of the coefficients and have two general question to the ppml output:

If I calculate the gravity model with PPML-command in Stata, do I have to convert all coefficients of the variables that were not logarithmized?

I found the formula (100*(exp(beta)-1)) in this forum, but in the journal articles on student mobility I read, no conversion was mentioned anywhere. So I'm not sure I got it right.

In my work I have a dependent variable with positive values, because of that I also want to run an OLS regression to show the (huge) difference between both methods.

For a comparison between OLS and PPML, which coefficients can I compare? The coefficients of the PPML-output or must a transformation for the non-log-variables be done (before comparing)?

Thanks a lot and best regards!

-------
These are my variables:
dependent variable:
bilateral student flows in PPML and log(bilateral student flows) in OLS

independent variables:
log of students sending country
log of students host country
log of distance between sending and host country capitals

dummy for non-eu-countries
amount of universities in the TOP 200 of Shanghai-Ranking in the host country
gdp per capita for sending
+ and host country
common spoken language from CEPII Database
common border from CEPII Database

I have 992 observations and only data for one year, no Panel, no FE etc.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#43

30 Jun 2020, 03:54

Dear Daniel Redekamp,

The interpretation in PPML and OLS on the logged outcome are exactly the same: for the regressors in logs the coefficients are elasticities, otherwise they are semi-elasticities. For the semi-elasticities, you should use the formula you mentioned, but it only makes a difference if the coefficient is larger than 0.1 in absolute value.

Best wishes,

Joao
Comment
Daniel Redekamp

Join Date: Jun 2020

Posts: 2
#44

30 Jun 2020, 07:57

Professor Joao Santos Silva, Thank you a lot for your answer!
Comment
Shikha Gupta

Join Date: Nov 2019

Posts: 13
#45

01 Jul 2020, 22:53

Dear Sir
Please help me with these.
1. Can you suggest me how to test endogeneity while using PPML estimations? I have exports as my dependent variable and am using a dummy of RTA in my model.
2. Is there any robustness check other than RESET test?
3. While going through the posts I learnt that you suggested one more way to perform RESET test, other than what is suggested on your page, that is,
predict XB, xb
qui su XB
gen XB2 = (XB-r(mean))^2
quietly ppml...
test XB2=0 . What is the difference between the two?
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment