ppml, ppml_panel_sg and ppmlhdfe

NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#1

ppml, ppml_panel_sg and ppmlhdfe

19 Apr 2020, 19:44

Hi all,

I'm a PhD student and greatly appreciate your valuable comments on the following.

I'm running a gravity equation for a panel with around 50,000 country pairs for 20 years. Also my equation includes lots of interaction variables (interact with year dummy) , that were used to convert time invariant variables in to time variant variables.

When I run the ppml command with country and time fixed effects, clustering the standard errors with country pairs, using strict option, I got the following;

Many warning messages saying most of the interaction variables has very large values, consider rescaling or recentering. Also, the 56 number of regressors were excluded to ensure that the estimates exist. Ultimately, after many iterations, the output was obtained without standard errors and p values.

My problem is Can I run the command without using Strict option?

Then I tried with ppml_panel_sg, I got the results with all the values. However, it took more than 10 hours to generate the estimation. Also, when I tried to the RESET test after the ppml_panel_sg estimation it runs forever without any output.

I also tried with ppmlhdfe. However, I got the error message r(3900) after a dayeven after I increase the mat size.

Your kind comments are greatly appreciated.

Niluka
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#2

20 Apr 2020, 01:40

Dear NILUKA PERERA EKANAYAKE,

The strict option should not generally be used with PPML, so you should avoid it. It is normal that some variables are excluded to ensure existence of the estimates, but the warnings you are receiving and the problems you are having, suggest that you should really rescale your variables.

Best wishes,

Joao
Comment
NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#3

20 Apr 2020, 02:11

Dear Santos,

I greatly appreciate your valuable and quick response. Actually, I estimated the model without Strict option. Still, I got coefficient values without std errors and p values. The warning message says variance matrix is nonsymmetric or highly singular. Could you please direct me on how to get rid of singular observations? I will rescale my variables and try again.

Thankas a lot
Niluka
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#4

20 Apr 2020, 08:42

Dear NILUKA PERERA EKANAYAKE,

There are a couple of possible reasons for this, but the most likely is that you have some perfectly collinear variables that Stata is not being smart enough to drop. So, besides rescaling the variables, please check that you are not including variables that are perfectly collinear with the fixed effects. Also, you should really try to use ppmlhdfe because it is much better at dealing with the fixed effects and should be much faster.

Best wishes,

Joao
Comment
NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#5

21 Apr 2020, 04:17

Dear Santos,

Your great piece of advise saved my time a lot. Thanks again.

I tried with rescaled variables and it was successful in generating results with p values and std errors. I'm using the predicted trade value from the gravity equation as an instrument for trade in an income effect analysis. Actually, I first run my regression with xtreg, xi: reg and then with ppml.

Now I want to select the most appropriate model out of these three. I used the RESET test to choose the best model specification. I really want to take ppml estimation as it gives coefficient values that are closer to the existing literature. However, the RESET test P-value is closer to zero (0.0101) in ppml while the ho is not rejected for xi:reg with importer, exporter and time fixed effects.

Can we say that ppml model is not specified correctly if the RESET test is failed? My regression includes lots of interaction variables ( which are interacted with time dummy eg: logdistance*year1).

Your valuable time spends in assisting junior academics like us is greatly appreciated.

Thanks
Niluka
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#6

21 Apr 2020, 08:58

Dear NILUKA PERERA EKANAYAKE,

I am glad it worked. Stata is very sensitive to these numerical issues.

I cannot imagine any real situation in which using a linear model for trade data is preferable to using PPML so that would be my choice. Maybe you can reconsider the specification of your model?

Best wishes,

Joao
Comment
NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#7

27 Apr 2020, 01:01

Dear Joao Santos Silva,

Thank you so much for the valuable advice. I will try to change the specification of the model.

Best
Niluka
Comment
NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#8

03 May 2020, 22:13

Dear Prof Joao Santos Silva,

Thank you so much for all the valuable comments on the previous posts of this thread. It is greatly appreciated if you could please confirm the following codes are accurate. I'm a bit confused about the "predict" code, as under PPML we are taking the dependent variable in levels my understanding is that we should not take the Exp(fitted values).

ppml totaltrade ldisyear1 ldisyear2 ldisyear3 ldisyear4 …………bpop_o bpop_d diso3_o* diso3_d* dyear*, cluster(idpair)
predict fitted_ppml_trade, mu

Then to see the specification accuracy of the model I took the square of fitted values as follows to conduct the RESER test.
gen fitted_ppml_trade_power=fitted_ppml_trade^2

ppml totaltrade ldisyear1 ldisyear2 ldisyear3 ldisyear4 …………bpop_o bpop_d fitted_ppml_trade_power diso3_o* diso3_d* dyear* , cluster(idpair)
test fitted_ppml_trade_power=0

I used the fitted_ppml_trade as the predicted trade value in my income effect analysis without taking the Exp of the fitted values as we do under xtreg or xi:reg commands.

Your valuable comments are greatly appreciated.

Thank you
Niluka Perera Ekanayake
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#9

04 May 2020, 01:35

Dear NILUKA PERERA EKANAYAKE,

You want to use "predict fitted_ppml_trade, xb", not what you are doing.

Best wishes,

Joao
Comment
NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#10

04 May 2020, 03:26

Dear Prof Joao Santos Silva,

That means to predict the fitted values, I should do the following;

ppml totaltrade ldisyear1 ldisyear2 ldisyear3 ldisyear4 …………bpop_o bpop_d diso3_o* diso3_d* dyear*, cluster(idpair)
predict fitted_ppml_trade, xb

Can I please ask one more question. i.e. once I have obtained the fitted values do I need to take the exponential value of fitted values as we do under xtreg and Xi:reg?

Thank you so much for your prompt reply. It is greatly appreciated.

Best reagrds
Niluka Perera Ekanayake
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#11

04 May 2020, 07:27

Dear NILUKA PERERA EKANAYAKE,

For the RESET test, you should not take exp exponential value of the predictions to include in the second regression. If you want to use your model to get fitted values or to make predictions, then use the option "n" which is the same as taking the exponential of the values obtained with the option "xb".

Best wishes,

Joao
Comment
NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#12

02 Oct 2020, 02:13

Dear Professor Santos,

Thank you for the help so far extended to me. I'm still having some unclear areas so solve. It would be greatly appreciated if you could clarify the following.

1. I'm using the gravity equation as follows to test the income effect of trade.
ppml totaltrade ldisyear1 ldisyear2 ldisyear3 ldisyear4 …………bpop_o bpop_d diso3_o* diso3_d* dyear*, cluster(idpair)
predict fitted_ppml_trade, xb
gen Trade_predict=exp(fitted_ppml_trade)

Then I use the Trade_predict in the income regression (in the IV to instrument actual trade value).

I want to know whether stated codes are correct. You have mentioned in the previous post to use option "n". I tried it, but I'm not sure how to use it. Could you please send me the code?
Furthermore, as we haven't used the log of trade value in the ppml, why we need to take the exponential value when we take the predict trade value?

2. I want to use pair fixed effects in the ppml. However, as there is a significant number of pairs that exist in the data set, stata does not allow me to generate the dummy variables for the pairs. As we need dummy variables in the ppml, could you please suggest as an alternative way to add pair fixed effects in the ppml?

Your help is greatly appreciated.

Thank you
Niluka
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#13

02 Oct 2020, 04:36

Dear NILUKA PERERA EKANAYAKE,

Sorry, the right option in this context is "mu"; if you do that you do not need to take the exponential of the predicted values.
To estimate models with fixed effects, I suggest you use the commands ppmlhdfe or xtpoisson with the fe option.

Best wishes,

Joao
1 like
Comment
NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#14

02 Oct 2020, 05:11

Dear Prof. Santos,

Thank you so much. Your quick response is greatly appreciated.

Niluka
Comment
NILUKA PERERA EKANAYAKE

Join Date: Apr 2020

Posts: 12
#15

04 Oct 2020, 04:10

Dear Prof. Joao Santos Silva ,
Thank you for your valuable advice so far. I'm working on ppmlhdfe as you advised. However, I was not able to generate out of sample predictions using the following command. i.e. I want to generate fitted values for country pairs using the geographic variables that actually have not traded during a particular year.

ppmlhdfe trade_actual ldistw larea lpop border, a(imp#year exp#year, save) standardize_data(0) d vce(cluster idpair) nolog

predict fitppmlhd2, mu
gen trade_predict_TT_2=fitppmlhd2

If you could advise me how to obtain out of sample predictions, that would be a great help.

Thanks a lot
Niluka
Comment

Announcement

ppml, ppml_panel_sg and ppmlhdfe

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment