Hi everyone!
I am new in this forum and STATA and I have a question regarding my research project and specifically panel data count models.
I am trying to investigate if the comparative advantage(measured by the RCA index)of China influences the Chinese Cross-border M&As and also if the comparative advantage of the host nation influences Chinese Cross-border M&As made by Chinese firms.Specifically, my panel dataset contains the number cross-border M&As projects in each industry(1-12), in 93 host nations from 1992 to 2016(Yijt). I am examining if Chinese cross-border M&As are going to industries where China is comparatively advantageous or in industries in which the host nation has a comparative advantage.
My dependent variable is a count and is the number(count) of investment projects in each host country(i), in each sector (j) in a given year(t):count variable(Y1). My data is panel data from 1992 to 2016 and I have 12 industries and 93 host nations. Specifically, my model is
Chinese CBMAsijt= constant + RCAChinajt + RCAhostijt + Controlsit+uijt
I have two main explanatory variables(RCA China and RCA host) which both vary by industry and I also incorporated a number of host country determinants as controls(10), time dummies and two interaction terms.
I define a three-dimensional panel data structure:
egen panelid: group(country_id industry)
xtset panelid Year
I am currently running these models:
(1) Poisson and NBreg with fixed effect (xtpoisson, fe)
(2) Pooled OLS count regression model(nbreg). In that case the panel data set structure is ignored and the data are pooled.
(3) PPML (-ppml-) by Silva & Tenreyro(2006)
When I run the Poisson or NBreg with fe, a lot of the observations were dropped due to zero outcome. I have thought of using zero-inflated Poisson (ZIP) however, I am not able to find a STATA command for ZIP specifically for panel data. The dependent variable has a large number of zeros since I don’t have Chinese cross-border M&As in each year, in each industry, in each host nation.
Do you think that is the PPML estimator is suitable for my analysis and if yes, why? Is it valid to argue that due to the presence of excessive zeros on my dependent variable(97%) is more suitable to use PPML?
Also, another paper in my area uses pooled OLS count estimator(nbreg without -xt specification) do you think that is better to ignore the data set structure of my data and go for a pooled OLS count estimator or I could use PPML?
Thank you very much for your help! Any advice, literature reference and explanation would be highly appreciated.
Comment