PPML estimation - regressors excluded

Ruken Kirkan

Join Date: May 2020
Posts: 18

#16

27 Jul 2020, 06:08

Hi Tom,
I tried running the regression again, and these were the results:

Code:

. ppmlhdfe export loggdpi loggdpj logdist contig comlang_off gatt_i gatt_j fta_hmr ebola_only_i ebola_only_j ebola_both, absorb(exp_id imp_id)
warning: dependent variable takes very low values after standardizing (3.5204e-09)
note: 4 variables omitted because of collinearity: gatt_i fta_hmr ebola_only_j ebola_both
Iteration 1:   deviance = 2.9429e+05  eps = .         iters = 5    tol = 1.0e-04  min(eta) =
>   -5.37  P   
Iteration 2:   deviance = 2.1559e+05  eps = 3.65e-01  iters = 3    tol = 1.0e-04  min(eta) =
>   -7.20      
Iteration 3:   deviance = 2.0040e+05  eps = 7.58e-02  iters = 3    tol = 1.0e-04  min(eta) =
>   -9.17      
Iteration 4:   deviance = 1.9787e+05  eps = 1.28e-02  iters = 3    tol = 1.0e-04  min(eta) =
>  -10.50      
Iteration 5:   deviance = 1.9749e+05  eps = 1.96e-03  iters = 3    tol = 1.0e-04  min(eta) =
>  -11.45      
Iteration 6:   deviance = 1.9739e+05  eps = 4.92e-04  iters = 3    tol = 1.0e-04  min(eta) =
>  -12.45      
Iteration 7:   deviance = 1.9736e+05  eps = 1.60e-04  iters = 2    tol = 1.0e-04  min(eta) =
>  -13.41      
Iteration 8:   deviance = 1.9735e+05  eps = 4.98e-05  iters = 2    tol = 1.0e-04  min(eta) =
>  -14.30      
Iteration 9:   deviance = 1.9735e+05  eps = 1.22e-05  iters = 2    tol = 1.0e-05  min(eta) =
>  -15.03      
Iteration 10:  deviance = 1.9735e+05  eps = 1.68e-06  iters = 2    tol = 1.0e-05  min(eta) =
>  -15.47   S  
Iteration 11:  deviance = 1.9735e+05  eps = 7.18e-08  iters = 2    tol = 1.0e-06  min(eta) =
>  -15.59   S  
Iteration 12:  deviance = 1.9735e+05  eps = 2.75e-10  iters = 2    tol = 1.0e-07  min(eta) =
>  -15.60   S O
--------------------------------------------------------------------------------------------
> ----------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below toleran
> ce)
Converged in 12 iterations and 32 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =      2,318
Absorbing 2 HDFE groups                           Residual df     =      2,285
                                                  Wald chi2(7)    =     179.26
Deviance             =   197345.334               Prob > chi2     =     0.0000
Log pseudolikelihood = -101651.8496               Pseudo R2       =     0.7111
------------------------------------------------------------------------------
             |               Robust
      export |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     loggdpi |   .0934371   .1455971     0.64   0.521    -.1919279    .3788022
     loggdpj |   1.824241    .285273     6.39   0.000     1.265116    2.383366
     logdist |  -1.995096   .4898565    -4.07   0.000    -2.955197   -1.034995
      contig |   1.557701   .6209295     2.51   0.012      .340702    2.774701
 comlang_off |    .353602    .122546     2.89   0.004     .1134162    .5937878
      gatt_i |          0  (omitted)
      gatt_j |  -.3391768   .4353503    -0.78   0.436    -1.192448    .5140941
     fta_hmr |          0  (omitted)
ebola_only_i |  -.5931968   .2234592    -2.65   0.008    -1.031169   -.1552249
ebola_only_j |          0  (omitted)
  ebola_both |          0  (omitted)
       _cons |   9.832597   4.527212     2.17   0.030     .9594241    18.70577
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
      exp_id |         6           0           6     |
      imp_id |        21           1          20     |
-----------------------------------------------------+

. 
.

The fta_hmr is a time-invariant dummy, and I think that might be because it is omitted.

Kind regards,
Ruken

Comment

Tom Zylkin

Join Date: Nov 2016

Posts: 188
#17

27 Jul 2020, 06:30

Hi Ruken,
Those results look better. Notice that one of your gatt variables was dropped.The fta_hmr variable should be pair-specific, so it should be identified even if it is time-invariant. After all, log distance is time-invariant and is identified.

Can you also check that "loggdpi" is indeed the natural log of country i's gdp? the coefficient looks smaller than I would expect.

Regards,
Tom
Comment
Ruken Kirkan

Join Date: May 2020

Posts: 18
#18

27 Jul 2020, 07:00

Hi Tom,
That sounds good. The fta_hmr variable is pair-specific, so that is questionable.

I coded the natural log of GDP by logging the variables gdp_i and gdp_j, but after checking the gdp_i and gdp_j variables again there might be a scale issue?
I have attached the variables information here:

If we look at the format for gdp_i and gdp_j, then I think the scaling is different for both. Could that be the problem?

Kind regards,
Ruken
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#19

27 Jul 2020, 07:20

Hi Ruken,
You should look at your actual data to determine if there is an issue. You should also look at summary statistics.
Regards,
Tom
Comment
Ruken Kirkan

Join Date: May 2020

Posts: 18
#20

27 Jul 2020, 09:51

Hi Tom,
I tried to recreate the values to see if I made a mistake but I still get the same values. Do you have an alternative solution? This is how my actual data looks

Kind regards,
Ruken
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#21

27 Jul 2020, 10:17

Those variables look okay but have different names than the ones used in your regression.
Comment
Ruken Kirkan

Join Date: May 2020

Posts: 18
#22

27 Jul 2020, 10:23

Hi Tom,
I logged the variables, and here they are

Kind regards,
Ruken
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#23

27 Jul 2020, 10:32

Hi Ruken,
It looked to me like they may already have been logged? not sure. it's unclear what units gdpi and gdpj are in. If they are in billions of dollars probably all is okay. Otherwise not sure.
Regards,
Tom
Comment
Ruken Kirkan

Join Date: May 2020

Posts: 18
#24

27 Jul 2020, 11:26

Hi Tom,
Thank you very much for your help. I could not have done this without your help.

I have a final question as I am trying to figure out if I understand this correctly. The reason for using country-specific controls is to account for multilateral resistance? What do we control for in ppmlhdfe compared to ppml?

Kind regards,
Ruken
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#25

27 Jul 2020, 11:30

Hi Ruken,
ppmlhdfe is ppml estimation with a dedicated approach to estimating fixed effects that is computationally faster. There is no difference in the interpretation of coefficients.

The reason for including controls is because it's the best you can do without including the exporter-time and importer-time fixed effects. Don't worry so much about multilateral resistance since you are trying to estimate the effect of a country-specific variable. As I mentioned before, if you want to also account for multilateral resistance you will need additional data on internal trade.

Regards,
Tom
2 likes
Comment
Ruken Kirkan

Join Date: May 2020

Posts: 18
#26

31 Jul 2020, 15:35

Hi Tom,
Thank you for all your help. I am very grateful for it as it helped me with understanding the estimations.
Kind regards,
Ruken
Comment
Joel Jansema

Join Date: Nov 2020

Posts: 8
#27

17 Nov 2020, 04:52

Dear Tom Zylkin,

I currently have a similar problem related to this topic for which I simply cannot find the right answer. I have tried several things in the last couple days and run out of possibilities. Maybe you can help me.

For my master thesis I am looking into the possible trade creation/diversion effect from the ASEAN-China FTA. I use a panel data from 2000 till 2018 consisting out of 339.604 observations and 164 countries. Using bilateral export data, no zero values included, as the dependent variable. At the same time of course introducing several gravity control variables (distance, gdp, common language etc.). To measure the trade create/diversion effect I follow Yang and Martinez-Zarzoso (2014) using three dummy variables. ACFTA1 being 1 if both countries are in the ACFTA, ACFTA2 being 1 if the exporter is in the ACFTA and the importer not. Finally, ACFTA3 being 1 if the importer is in the ACFTA and the exporter not. Following several papers I have decided that in my final/concluding equation I will use ppmlhdfe adding both importer- and exporter-time fixed effects plus country pair fixed effects clustered by countrypair. However, when I do this ACFTA2 and ACFTA3 drop out due to collinearity. I can almost make all combinations (of fixed effects) where they do not dropout, but from the moment when I add importer- and exporter-time fixed effects they do. I do not understand why ACFTA2 and ACFTA3 drop out since this has been done in many influencing papers.
The final equation will look as follow: ppmlhdfe EXP ACFTA10 ACFTAexp ACFTAimp , absorb( exp_time imp_time) cluster(countrypair)

Do you have any idea why this is not possible for me but it is possible in other papers?

Kind regards,
Joel
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#28

17 Nov 2020, 07:40

Originally posted by Joel Jansema View Post

Dear Tom Zylkin,

I currently have a similar problem related to this topic for which I simply cannot find the right answer. I have tried several things in the last couple days and run out of possibilities. Maybe you can help me.

For my master thesis I am looking into the possible trade creation/diversion effect from the ASEAN-China FTA. I use a panel data from 2000 till 2018 consisting out of 339.604 observations and 164 countries. Using bilateral export data, no zero values included, as the dependent variable. At the same time of course introducing several gravity control variables (distance, gdp, common language etc.). To measure the trade create/diversion effect I follow Yang and Martinez-Zarzoso (2014) using three dummy variables. ACFTA1 being 1 if both countries are in the ACFTA, ACFTA2 being 1 if the exporter is in the ACFTA and the importer not. Finally, ACFTA3 being 1 if the importer is in the ACFTA and the exporter not. Following several papers I have decided that in my final/concluding equation I will use ppmlhdfe adding both importer- and exporter-time fixed effects plus country pair fixed effects clustered by countrypair. However, when I do this ACFTA2 and ACFTA3 drop out due to collinearity. I can almost make all combinations (of fixed effects) where they do not dropout, but from the moment when I add importer- and exporter-time fixed effects they do. I do not understand why ACFTA2 and ACFTA3 drop out since this has been done in many influencing papers.
The final equation will look as follow: ppmlhdfe EXP ACFTA10 ACFTAexp ACFTAimp , absorb( exp_time imp_time) cluster(countrypair)

Do you have any idea why this is not possible for me but it is possible in other papers?

Kind regards,
Joel

Hi Joel,

Note that ACFTA1 + ACFTA2 would give you a dummy that is always equal to 1 for that exporter in that year. Thus, ACFTA1 and ACFTA2 are together collinear with your exporter-time fixed effect. A similar argument applies for ACFTA3. You would have to ask the authors of that paper why their results table shows estimates for those coefficients in the presence of exporter-time and importer-time fixed effects. I would expect there is some sort of typo (it happens.)

Regards,
Tom
1 like
Comment
Joel Jansema

Join Date: Nov 2020

Posts: 8
#29

20 Nov 2020, 06:39

Dear Tom,

Thank you for your answer, I undestand what you are saying. I will further look into this manner!

Kinds regards.
Comment
Maria Wang

Join Date: Feb 2022

Posts: 2
#30

02 Feb 2022, 03:41

Hi Tom Zylkin, I hope it is okay to use this thread to ask a question similar to previous ones.

I am estimating effects of the EU ETS and use panel data from 60 countries in 2000-2018. In the main estimation I have data from separate industries, but when trying to estimate the effects for single industries I have some problems.

The model I use is this:
ppmlhdfe imports eu rta eu_ets_imp eu_ets_exp, absorb(country_time partner_time pair_id) vce(cluster pair_id)

where eu_ets_imp equals 1 when the importer is in the ETS and eu_ets_exp equals 1 when the exporter is. The problem is that eu_ets_exp is dropped because of collinearity with the fixed effects. I don't quite understand why, as I also have intra-trade data where the EU ETS dummies are always 0, as well as when neither of the country is in the ETS. When using the basic ppml command, it just drops some fixed effects and not the EU ETS dummies, but it takes forever to compute. So my question would be if it is possible to keep specific regressors and drop fixed effects instead with the ppmlhdfe command? Of course maybe there is just some issue with my model. Might be that because ETS took effect in the same year for all participating countries (except 3 who joined a bit later), it just doesn't have enough variation. But thanks in advance if you have some comments!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment