PPML and Pooled OLS count regression model

Lida Metallinou

Join Date: Mar 2018

Posts: 16
#1

PPML and Pooled OLS count regression model

09 Oct 2018, 09:56

Hi everyone!
I am new in this forum and STATA and I have a question regarding my research project and specifically panel data count models.

I am trying to investigate if the comparative advantage(measured by the RCA index)of China influences the Chinese Cross-border M&As and also if the comparative advantage of the host nation influences Chinese Cross-border M&As made by Chinese firms.Specifically, my panel dataset contains the number cross-border M&As projects in each industry(1-12), in 93 host nations from 1992 to 2016(Y_ijt). I am examining if Chinese cross-border M&As are going to industries where China is comparatively advantageous or in industries in which the host nation has a comparative advantage.
My dependent variable is a count and is the number(count) of investment projects in each host country(i), in each sector (j) in a given year(t):count variable(Y₁). My data is panel data from 1992 to 2016 and I have 12 industries and 93 host nations. Specifically, my model is

Chinese CBMAs_ijt= constant + RCAChina_jt+ RCAhost_ijt+ Controls_it+u_ijt

I have two main explanatory variables(RCA China and RCA host) which both vary by industry and I also incorporated a number of host country determinants as controls(10), time dummies and two interaction terms.

I define a three-dimensional panel data structure:
egen panelid: group(country_id industry)
xtset panelid Year

I am currently running these models:
(1) Poisson and NBreg with fixed effect (xtpoisson, fe)
(2) Pooled OLS count regression model(nbreg). In that case the panel data set structure is ignored and the data are pooled.
(3) PPML (-ppml-) by Silva & Tenreyro(2006)

When I run the Poisson or NBreg with fe, a lot of the observations were dropped due to zero outcome. I have thought of using zero-inflated Poisson (ZIP) however, I am not able to find a STATA command for ZIP specifically for panel data. The dependent variable has a large number of zeros since I don’t have Chinese cross-border M&As in each year, in each industry, in each host nation.
Do you think that is the PPML estimator is suitable for my analysis and if yes, why? Is it valid to argue that due to the presence of excessive zeros on my dependent variable(97%) is more suitable to use PPML?
Also, another paper in my area uses pooled OLS count estimator(nbreg without -xt specification) do you think that is better to ignore the data set structure of my data and go for a pooled OLS count estimator or I could use PPML?

Thank you very much for your help! Any advice, literature reference and explanation would be highly appreciated.
Tags: None
Joao Santos Silva

Join Date: Apr 2014

Posts: 2997
#2

09 Oct 2018, 13:24

Dear Lida Metallinou,

I think there are some confusions in your post (e.g., xtpoisson does not estimate a NBreg model and pooled OLS is not a count regression). Anyway, given that your dependent variable is a count, I would recommend that you use Poisson regression with FE. This is a very robust approach that is likely to give you sensible results even if the data is not Poisson. See

Wooldridge, Jeffrey, (1999), Distribution-free estimation of some nonlinear panel data models, Journal of Econometrics, 90, issue 1, p. 77-97.

I would stay away from zero inflated models in this context.

Best wishes,

Joao
1 like
Comment
Lida Metallinou

Join Date: Mar 2018

Posts: 16
#3

10 Oct 2018, 03:30

Dear Professor Silva,
Thank you very much for your quick reply.
Apologies for the confusion I am quite new in the use of applied econometrics using STATA.

I first run <xtpoisson dep.var. indep.var. year dummies, fe> and <xtnbreg dep.var. indep.var. year dummies, fe> in both cases, a lot of observations(11.367) were dropped due to zero outcome. Also, in these two cases the the Wald Chi2 and Prob>Chi2 are missing.

A research paper in the related literature in my area faces the same problem and as they described instead of using "fixed effect Poisson/Negative Binomial Regression" instead they run "Count pooled regression with year and country dummies".
As a result, I also tried to apply the same so I run <nbreg dep.var indep.var year dummies> and<poisson dep.var indep.var year dummies>which did not drop any zero observations.

Thank you for your suggestion to use Poisson regression with FE. May I ask specifically if you mean to run one of the following options?
(1) xtpoisson dep.var. indep.var. year dummies, fe
(2) ppml dep.var. indep.var. year dummies, cluster(id)
(3) poisson dep.var. indep.var. year dummies, vce(cluster id)

From my understanding, the (2) and (3) option give the same results and seem to work well.

Thank you very much for your help,
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2997
#4

10 Oct 2018, 07:19

Dear Lida Metallinou,

It is natural that a lot of observations are dropped when you run xtpoisson with FE; that is fine an those observations are not informative about the parameters of interest so dropping them or keeping them makes no difference. I suggest you stay away from xtnbreg.

I do not know what is "Count pooled regression with year and country dummies" so I still recommend Poisson regression with FE.

Indeed, in this context (2) and (3) are the same. (1) includes fixed effects for id, which are not included in (2) and (3). It is up to you to decide whether there FE are needed.

Best wishes,

Joao
Comment
Lida Metallinou

Join Date: Mar 2018

Posts: 16
#5

11 Oct 2018, 04:26

Dear Professor Silva,

Thank you very much for your reply and time.
Do you think that If I go for the option(2) or (3) without including FE is totally wrong, since my depend.variable and my two indep. variables are in country-sector-year form? As I mentioned, I am trying to model the number of M&As in each industry, in each country every year but the majority are zeros.

From a previous post https://www.statalist.org/forums/for...binomial-model I understood that <ppml> share some similarities with the Poisson regression with FE
<xtpoisson Y X, fe>. I am trying to understand what is the difference between the PPML estimator and Poisson FE estimator and decide, which best suits my data.

Lastly, how the PPML can have the same results with xtpoisson ?From reading previous posts I understood that the following options should give the same results?
<xtpoisson Y X, fe> in that case panel id is countries * industries. In this case I understand that I include in the model a very large number of fixed effects.
<xi:ppml Y X country dummies, industry dummies>

Thank you for your time and help,

Kind Regards,
Lida
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2997
#6

11 Oct 2018, 15:20

Dear Lida Metallinou,

1 - Whether or not to include the fixed effects is a modelling decision that only you can take.

2 - By default, ppml does not include any fixed effects, whereas xtpoisson does. If you manually include the fixed effects in ppml, the resultsare identical to those obtained with xtpoisson.

3 - Both ppml and xtpoisson estimate aPoisson regression and they produce the same results when the same regressors (including FE) are used.

Best wishes,

Joao
Comment
Lida Metallinou

Join Date: Mar 2018

Posts: 16
#7

12 Oct 2018, 06:33

Dear Professor Silva,

Thank you very much for your reply and for your time.
So, the difference between PPML and <xtpoisson, fe> is the fixed effect. Both of them are applicable to count panel data models.

One last query that I would like to pose concerns a different dependent variable on the above model. Specifically, which do you think could be the best estimation technique to follow if instead of modelling the number of M&As which is a count of investment projects in each host country(i), in each sector (j) in a given year(t), I use a different dependent but similar dependent variable which is the value of M&As. In that case, the dependent variable is continuous variable. Also, this continiuous variable has a lot of zeros. My original model remains the same I just want to change the dependent variable.

Thank you again very much for your helpful advice,

Kind Regards,
Lida
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2997
#8

14 Oct 2018, 11:11

Dear Lida Metallinou,

It may make sense to use the same regressors and the same estimator for the value (rather than the number) of the M&A; the large number of zeros is no problem.

Best wishes,

Joao
Comment
Lida Metallinou

Join Date: Mar 2018

Posts: 16
#9

16 Oct 2018, 04:05

Dear Professor Silva,

Thank you very much for your reply,
So, the Poisson regression with FE and the PPML with FE is also applicable to continuous variables and counts?

Best wishes,
Lida
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2997
#10

16 Oct 2018, 07:24

Dear Lida Metallinou,

That is correct.

Best wishes,

Joao
Comment
Lida Metallinou

Join Date: Mar 2018

Posts: 16
#11

17 Oct 2018, 03:06

Thank you very much for your helpful advise.

Lida
Comment
Lida Metallinou

Join Date: Mar 2018

Posts: 16
#12

30 Nov 2018, 05:38

Originally posted by Joao Santos Silva View Post

Dear Lida Metallinou,

It may make sense to use the same regressors and the same estimator for the value (rather than the number) of the M&A; the large number of zeros is no problem.

Best wishes,

Joao

Dear Professor Silva,

You mention on one of your previous post that xtpoisson with FE and PPML is applicable to count and continuous variables.Could it be possible to give me any reference for that?

Thank you very much,
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 2997
#13

30 Nov 2018, 06:40

Dear Lida Metallinou,

Sure, please see:

Santos Silva, J.M.C. and Tenreyro, Silvana (2006), The Log of Gravity, The Review of Economics and Statistics, 88(4), pp. 641-658.

Best wishes,

Joao
Comment
Lida Metallinou

Join Date: Mar 2018

Posts: 16
#14

02 Dec 2018, 02:24

Dear Professor Silva,

Thank you very much for this.

best wishes,
Lida
Comment
Lida Metallinou

Join Date: Mar 2018

Posts: 16
#15

03 Dec 2018, 04:18

Dear Professor Silva,

One more query, in previous posts on STATALIST you mention that overdispersion on the context of xtpoisson with fe or PPML is not a serious issue unless you want to compute probabilities of certain counts. Is there any reference that I could use for that since I will use this estimation technique?
thank you
Comment

Announcement

PPML and Pooled OLS count regression model

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment