ppml_panel_sg and RESET test

Dimitrios Dadakas

Join Date: Mar 2019

Posts: 5
#1

ppml_panel_sg and RESET test

07 Mar 2019, 00:27

Dear Statalist,

I am trying to estimate the RESET and the Park test with the ppml_panel_sg command. I have already managed to get the same coefficient parameters using both the 'ppml' and 'ppml_panel_sg' commands, Once, however, I include in the model both exporter/importer-time FE and country pair FE the simple 'ppml' command,does not produce the standard errors or the RESET test. So I had to go to the faster ppml_panel_sg command. A similar code to the one I am presenting below works fine with the simple 'ppml' command. The following code that employs ppml_panel_sg includes exporter/importer-time-FE and country pair FE.

/// code starts here
set more off
ppml_panel_sg xftaif exporter!=importer, ex(exporter) im(importer) y(year) pred(predicten4) robust
estimates store R7, title(PPML with Exporter/ Importer Time FE and CP FE)
predict fit1,xb
*predict ehat1, res
gen fit2=fit1^2

quietly ppml_panel_sg x fta fit2 if exporter!=importer, ex(exporter) im(importer) y(year)robust
test fit2=0
* next lines are wrong
gen ehat1=x-fit1
gen ehat2=ehat1^2
glm ehat2fit1if exporter!=importer, cluster(pair) family(poisson) diff iter(30)
test fit1=1
test fit1=2

drop fit1 fit2 ehat1 ehat2

/// code ends here

I am not sure that the RESET test (and the predict command) is estimating what it is supposed to. I cannot get a value for the RESET as the fitted values are now 0 (for all those observations that have FTA=0) and a constant (0.944 - for all those observations that have FTA=1). Is this an issue with the link function. Should the pred() command? Can I get the residual values somehow for the park test (because getting them as 'gen ehat1=x-fit1' is wrong since predicted and fitted values are different)?

Thanks in advance for any input
Tags: None
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#2

07 Mar 2019, 15:42

HI Dimitrios,

For the RESET test, you need the predicted values from the ppml regression. In your 2nd line, it appears these are being saved as a variable called "predicten4". If you replace "fit1" on your fourth line with "predicten4" you should get a different result (in particular, the observations with FTA=0 will not have a predicted value of 0.) This is also an issue for your Park test.

Hope this is helpful!

Tom
Comment
Dimitrios Dadakas

Join Date: Mar 2019

Posts: 5
#3

08 Mar 2019, 01:47

Hello Pr. Tom,
I really appreciate your input and it partially solved my problem. To double check that I am doing this correctly, I created some mock data (with only 500 panel observations and only exports and fta so that the program can ran fast) and I tried to replicate the RESET and Park results from the ppml command with the ppml_panel_sg command.Parameter estimates are the same.However, although the predicted values from the ppml_panel_sg command are for the most part the same with the fitted values of the ppml command (once I take the logarithm of the ppml_panel_sg predicted values of course) not all of the observations have a predicted value (with the ppml_panel_sg command) and as a result the park and RESET test differ in the two processes.

I looked at one of your earlier posts https://www.statalist.org/forums/for...ations-droppedand I tried to update the command through SSC but that did not help.Moreover, looking at your answer (11 Nov 2016, 02:53) if there was an issue with positive observations, this problem should be present with both the ppml and the panel_ppml_sg command.

I must be missing something.I am attaching the data and the program in case that helps. In any case, I really appreciate your input and I will try to solve this myself as well.
Best Regards and thanks again
Dimitrios

/// program starts here

clear
clear mata
clear matrix
set more off

set maxvar 11000
set matsize 11000
*EXPORTS-IMPORTS - - - - - - - - - - - - - - - - - - - - - - - - - -*/

import excel "J:\ddsd\Book1.xlsx", sheet("Sheet1") firstrow

cap egen ye = group(year )
cap egen expo = group(exporter )
cap egen impo = group(importer )
cap egen exp_time = group(exporter year)
cap egen imp_time = group(importer year)
cap egen pair = group(exporter importer)

quietly tabulate ye, gen(YEAR_FE)
quietly tabulate expo, gen(EXPORTER_FE)
quietly tabulate impo, gen(IMPORTER_FE)
quietly tabulate exp_time, gen(EXPORTER_T_FE)
quietly tabulate imp_time, gen(IMPORTER_T_FE)
quietly tabulate pair, gen(COUNTRY_PAIR)

/// model 5 PPML with with Time ex/im FE for importer and exporter - no time effects
set more off

ppml x IMPORTER_T_FE* EXPORTER_T_FE* fta if exporter!=importer, cluster(pair)
estimates store R5a, title(PPML with Time FE)
predict fit1, xb
predict ehat1, res

gen ehat2=ehat1^2
gen fit2=fit1^2

quietly ppml x IMPORTER_FE* EXPORTER_FE* fta fit2 if exporter!=importer, cluster(pair)
test fit2=0

glm ehat2 fit1 if exporter!=importer, cluster(pair) family(poisson) diff iter(30)
test fit1=1
test fit1=2

gen fit1ppml=fit1

drop fit1
drop fit2
drop ehat1
drop ehat2

/// 5b using the ppml-panel-sg command instead, replicate results

set more off
ppml_panel_sg x fta if exporter!=importer, ex(exporter) im(importer) y(year) nopair pred(fitall)
estimates store R5b, title(PPML with Exporter/ Importer Time FE)

gen fit1=ln(fitall)

* this is where I compare fit1 from this process with the fit1ppml from the previous process
gen fit2=fit1^2

quietly ppml_panel_sg x fta fit2 if exporter!=importer, ex(exporter) im(importer) y(year) nopair
test fit2=0
gen ehat1=x-fit1
gen ehat2=ehat1^2
glm ehat2 fit1 if exporter!=importer, cluster(pair) ///
family(poisson) diff iter(30)
test fit1=1
test fit1=2

/// keep fit1 x predicten10a
drop fitall
drop fit1
drop fit2
drop ehat1
drop ehat2

outreg2 [ R5a R5b ] using C:\Laz\myreg5.doc, word replace dec(2) label symbol(***,**,*) alpha(0.01,0.05,0.1) ///
keep( lnyi lnyj lndistw yi yj contig comlang_off colony fta)

drop fit1ppml

/// program ends here

Attached Files

Book1.xlsx (20.6 KB, 1 view)
Comment
Dimitrios Dadakas

Join Date: Mar 2019

Posts: 5
#4

08 Mar 2019, 01:58

Correction: IMPORTER_FE* EXPORTER_FE* --> IMPORTER_T_FE* EXPORTER_T_FE*
Comment
Tom Zylkin

Join Date: Nov 2016

Posts: 188
#5

08 Mar 2019, 08:40

Hi Dimitrios,

Thank you for asking some very interesting questions!

The reason why not all observations have a predicted value is because the estimation only includes observations where "exporter != importer". So it is only the observations for which exporter != importer" where the variables you are creating do not have fitted values. That seems okay, and you can ensure the predicted values from ppml are consistent by (eg) replacing predict fit1, xb with predict fit1 if exporter != importer.

That still does not explain why ppml_panel_sg appears to give a different answer for the RESET test, but the answer there has to do with the warning message it gives when you run the RESET test regression:

Max number of iterations reached before estimates converged. Consider adjusting the maxiter() option

In other words, ppml_panel_sg failed to converge under the default number of iterations. This is probably because there is a very high correlation between fit2 and the original linear predictor fit2 such that the likelihood function is very flat at the point where it is maximized (this is also related to the concept of separation, since the fitted model has some very small fitted values that take a long time to fit correctly and the model is very nearly separated.)

At any rate, I found it is possible to reproduce the output from ppml if you change the options so that ppml_panel_sg has a lower tolerance and a higher number of allowable iterations (the lower tolerance is needed because of how flat the likelihood function is.) However, it takes several minutes to run, which is far from ideal considering regular ppml takes less than a minute here.

Fortunately, there is now a command you can try called ppmlhdfe, which is very robust to issues related to separation. It also uses an IRLS-based algorithm (similar to ppml), which appears to work better with the flatter likelihood. (In general, ppmlhdfe is a more robust algorithm and at some point I will be updating ppml_panel_sg so that it serves as a wrapper for ppmlhdfe for people who prefer ppml_panel_sg's syntax.)

The syntax in that case would be

ppmlhdfe x fta fit2 if exporter !=importer, a(exporter#year importer#year)

which will quickly give you the same estimates as ppml.

For the Park/MaMu test, I found that the answer is the same regardless of which set of predictions / residuals is used. The only issue with your code is that on line 63, you are defining "ehat1" as "x-fit1" instead of "x-fitall".

(EDIT: I made a mistake earlier that suggested glm ehat2 fit1 if exporter!=importer gave the wrong answer. ppmlhdfe, glm, and ppml do all give the same answer.)

Hope this is helpful!

Regards,
Tom

Last edited by Tom Zylkin; 08 Mar 2019, 09:12.
Comment
Dimitrios Dadakas

Join Date: Mar 2019

Posts: 5
#6

08 Mar 2019, 23:09

Dear Pr, Tom,
thank you very much for the very detailed, concise and clear reply. I think this should cover all my problems. I will just go through them and once I get everything working I'll just log back in to say thank you again!
Best Regards,
Dimitrios
Comment
Dimitrios Dadakas

Join Date: Mar 2019

Posts: 5
#7

12 Apr 2019, 08:28

Dear pr. Tom,
as I said, thought I should say thanks again, got everything solved out on that paper. Best Regards,
Dimitrios
Comment
Mulenga Chonzi

Join Date: Aug 2019

Posts: 4
#8

25 Aug 2019, 05:52

Can I run a Pseudo-Poisson Maximum Likelihood in panel data when not using a Gravity Model? What is the STATA command?

I want to run the Pseudo-Poisson Maximum Likelihood (PPML) in a panel data framework as my dependent variable has many zeroes. However, my challenge is that from all the literature I have read on the PPML, it seems to only work in gravity model type of estimation. Is it possible to run a PPML using panel data for a non-gravity type of model? If it is possible, what is the STATA command to use?
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#9

25 Aug 2019, 11:12

Please see here.
Comment

alessio lombini

Join Date: Dec 2020
Posts: 98

#10

21 Jul 2021, 09:08

Dear Joao and Tom,
I am also trying to conduct a RESET test for my gravity model. My variable of interest is regulatory distance, which represents a bilateral measure of heterogeneity in Non-Tariff measures. Following the literature, I include importer_year and exporter_year fixed effects. Moreover, I also include importer and exporter pair fixed effects (pair_id) to account for the endogeneity of the control variables. Unfortunately, the RESET test shows a p-value lower than 5%. However, when I exclude pair_id from the model the coefficient is statistically significant and the Reset Test presents a p-value > 10% . Do you have any suggestions on how to handle this issue? I also tried to include industry and year fixed effects but the results do not change significantly. I attach below the code with the output

Code:

. asdoc ppmlhdfe gvc_twiodn regulatory_distance contig applied_tariff ln_dist co
> mlang_off comlang_ethno comcol comrelig col45 comleg_pretrans sibling rta if e
> xp!=imp, a(exp_time imp_time pair_id) vce(cluster pair_id) replace
warning: dependent variable takes very low values after standardizing (1.7548e-1
> 9)
note: 9 variables omitted because of collinearity: contig ln_dist comlang_off co
> mlang_ethno comcol comrelig col45 comleg_pretrans sibling
Iteration 1:   deviance = 6.4108e+07  eps = .         iters = 7    tol = 1.0e-04
>   min(eta) =  -8.36  P   
Iteration 2:   deviance = 4.4553e+07  eps = 4.39e-01  iters = 7    tol = 1.0e-04
>   min(eta) = -12.10      
Iteration 3:   deviance = 4.2030e+07  eps = 6.00e-02  iters = 6    tol = 1.0e-04
>   min(eta) = -16.49      
Iteration 4:   deviance = 4.1805e+07  eps = 5.38e-03  iters = 5    tol = 1.0e-04
>   min(eta) = -18.55      
Iteration 5:   deviance = 4.1771e+07  eps = 8.11e-04  iters = 5    tol = 1.0e-04
>   min(eta) = -18.75      
Iteration 6:   deviance = 4.1764e+07  eps = 1.59e-04  iters = 3    tol = 1.0e-04
>   min(eta) = -18.75      
Iteration 7:   deviance = 4.1763e+07  eps = 2.80e-05  iters = 2    tol = 1.0e-04
>   min(eta) = -18.75      
Iteration 8:   deviance = 4.1763e+07  eps = 4.03e-06  iters = 3    tol = 1.0e-05
>   min(eta) = -19.37      
Iteration 9:   deviance = 4.1763e+07  eps = 4.48e-07  iters = 3    tol = 1.0e-06
>   min(eta) = -20.07   S  
Iteration 10:  deviance = 4.1763e+07  eps = 3.30e-08  iters = 3    tol = 1.0e-07
>   min(eta) = -20.48   S  
Iteration 11:  deviance = 4.1763e+07  eps = 1.36e-09  iters = 4    tol = 1.0e-08
>   min(eta) = -20.59   S O
--------------------------------------------------------------------------------
> ----------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon b
> elow tolerance)
Converged in 11 iterations and 48 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =    231,609
Absorbing 3 HDFE groups                           Residual df     =      1,481
Statistics robust to heteroskedasticity           Wald chi2(3)    =      37.78
Deviance             =  41762682.75               Prob > chi2     =     0.0000
Log pseudolikelihood = -21289494.28               Pseudo R2       =     0.6698

Number of clusters (pair_id)=      1,482
                             (Std. Err. adjusted for 1,482 clusters in pair_id)
-------------------------------------------------------------------------------
              |               Robust
   gvc_twiodn |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
regulatory_~e |  -.8293929   .7089356    -1.17   0.242    -2.218881    .5600954
       contig |          0  (omitted)
applied_tar~f |   -.034936   .0058947    -5.93   0.000    -.0464894   -.0233826
      ln_dist |          0  (omitted)
  comlang_off |          0  (omitted)
comlang_ethno |          0  (omitted)
       comcol |          0  (omitted)
     comrelig |          0  (omitted)
        col45 |          0  (omitted)
comleg_pret~s |          0  (omitted)
      sibling |          0  (omitted)
          rta |   -.001473   .0410393    -0.04   0.971    -.0819085    .0789626
        _cons |   6.469758   .0663759    97.47   0.000     6.339663    6.599852
-------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
    exp_time |       385           0         385     |
    imp_time |       378          15         363     |
     pair_id |      1482        1482           0    *|
-----------------------------------------------------+
* = FE nested within cluster; treated as redundant for DoF computation
(note: file Myfile.doc not found)
Click to Open File:  Myfile.doc

. * Perform the RESET Test
. predict fit, xb
(96,411 missing values generated)

. generate fit2 = fit^2
(96,411 missing values generated)

. 
. qui ppmlhdfe gvc_twiodn regulatory_distance contig applied_tariff ln_dist coml
> ang_off comlang_ethno comcol comrelig col45 comleg_pretrans sibling rta fit2 i
> f exp!=imp, a(exp_time imp_time pair_id) vce(cluster pair_id)               

. test fit2 = 0

 ( 1)  fit2 = 0

           chi2(  1) =   42.59
         Prob > chi2 =    0.0000

Since this is for my master thesis, I would sincerely appreciate any help you may provide.
Thank in advance !

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#11

22 Jul 2021, 00:20

Dear alessio lombini,

If I understand correctly what you did, in your model the RESET checks for the omission of cross products and squares of the 3 regressors in your model. Maybe you can try to include those and see if the results mane sense and the new model passes the test. Because you are not going to make policy suggestions based on this work, if you still have problems, I think it is OK for you to describe what you did and say that further investigation would be needed.

Best wishes,

Joao
Comment
alessio lombini

Join Date: Dec 2020

Posts: 98
#12

22 Jul 2021, 01:00

Dear Joao,
thank you very much for your help, it is very nice to receive feedback from an economist of your stature. If I well understood, you are suggesting to include the squared terms of these three variables in the model. I was indeed wondering to include them in the model but I was not sure if it was a legit approach in a gravity model since I have never seen this in the literature. Do you think the gravity model does not preclude this possibility?

Also, the reason behind the inclusion of country-pair fixed effects is to control for the endogeneity of these variables. In the literature, I saw that I can get the same result by including the lagged term of the dependent variable in the model. Do you think there is any shortcoming in following this approach in this context?
Thank you again for your precious help.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#13

22 Jul 2021, 02:40

Dear alessio lombini,

Yes, the squares and cross products of those variables. Although that is not standard, there is no reason not to try to see if it improves. I would not use lags.

Best wishes,

Joao
Comment

alessio lombini

Join Date: Dec 2020
Posts: 98

#14

28 Aug 2021, 06:41

Dear Joao Santos Silva,

thank you a lot for your insightful feedback. I tried to include in the model the interaction term between the regulatory distance (RD) and the exporter's Gross Domestic Income (GNI)

Code:

  
 β1 (ln⁡(1+ RD_ijt-1)×GNI_it)

where the GNI is expressed in thousands of US dollar and it is not logged (to help with the interpretation. The model yields the following output

Code:

  asdoc ppmlhdfe gvc_total c.lag1_lnRD_imputed##c.GNIcap_exp lag1_lntariffs lag1_RT > A ln_dist $dummy_list if exp!=imp, a(exp_sector_time imp_sector_time) vce(cluster  > pair_sector_id) replace nest label cnames(overall GVC participation) dec(3) add(RE > SET Test, 0.000, AVE, 0.000) d (dropped 5066 observations that are either singletons or separated by a fixed effect > )    HDFE PPML regression                              No. of obs      =    657,981 Absorbing 2 HDFE groups                           Residual df     =     68,928 Statistics robust to heteroskedasticity           Wald chi2(13)   =    8276.65 Deviance             =  15296318.24               Prob > chi2     =     0.0000 Log pseudolikelihood = -8466648.213               Pseudo R2       =     0.9144  Number of clusters (pair_sector_id)=    68,929                          (Std. Err. adjusted for 68,929 clusters in pair_sector_id) -----------------------------------------------------------------------------------                   |               Robust         gvc_total |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval] ------------------+---------------------------------------------------------------- lag1_lnRD_imputed |  -.1589669   .0238908    -6.65   0.000    -.2057921   -.1121417        GNIcap_exp |          0  (omitted)                   |               c.ln⁡(1+ RDijt-1)#|      c.GNIit |   .0029432   .0005393     5.46   0.000     .0018863    .0040001                   |    lag1_lntariffs |  -.1669346   .0197142    -8.47   0.000    -.2055738   -.1282954          lag1_RTA |   .1113005   .0355856     3.13   0.002      .041554    .1810471           ln_dist |  -.8107724   .0216022   -37.53   0.000    -.8531119   -.7684329            contig |   .3042479   .0290538    10.47   0.000     .2473036    .3611922     comlang_ethno |   .1456843   .0371045     3.93   0.000     .0729608    .2184078            comcol |   -.229223   .1156892    -1.98   0.048    -.4559697   -.0024762          comrelig |  -.0529728   .0549299    -0.96   0.335    -.1606336    .0546879             col45 |   .0751228   .0871865     0.86   0.389    -.0957595    .2460052  comleg_posttrans |   .2575868   .0244418    10.54   0.000     .2096817    .3054919 transition_lega~e |  -.1042719   .0398606    -2.62   0.009    -.1823973   -.0261465           sibling |  -.1060025    .079126    -1.34   0.180    -.2610867    .0490817             _cons |   13.01029   .1793332    72.55   0.000      12.6588    13.36178 -----------------------------------------------------------------------------------  Absorbed degrees of freedom: ---------------------------------------------------------+      Absorbed FE | Categories  - Redundant  = Num. Coefs | -----------------+---------------------------------------|  exp_sector_time |     12087           0       12087     |  imp_sector_time |     11655         210       11445     | ---------------------------------------------------------+

Since there is no much in the literature about the interpretation of I would have two questions: 1) Is it correct to create an interaction between a logged (RD) and non-logged (GNI) variable when implementing a PPML estimator? 2) Since GNI is not logged, do I need to transform the coefficient of the interaction in order to interpret it? Looking forward to your feedback and comments. Thank you in advance for your time

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#15

28 Aug 2021, 09:29

Dear alessio lombini,

Are you sure you want to include GDI without logging it? Keep in mind that it is inside the exponential function. Anyway, you can include any interaction that is considered relevant and the interpretation is always the same.

Best wishes,

Joao
Comment

Announcement