Dear Statalists,
I have two panels, both of which with thousands of individuals (ID) followed through a 35-year period. It has a substantial number of observations in both datasets. Ultimately, I want to estimate the effect of a given exogenous shock on year t=T on the possession of a given asset that is measured through a count variable. I am conducting the estimates separately for each panel.
Since I am working with longitudinal data, it seemed adequate to control for individual's fixed effects using a negative binomial regression. However, after reading about -xtnbreg, fe-, it became clear to me through the works of Allison and Waterman (2002), Greene (2007) and Guimaraes (2008) that this command does not deliver a fixed effect estimation to satisfaction. Apparently, using -xtnbreg, fe- makes a model vulnarable to the "incidental parameters bias".
The most straighforward solution to the problem is to run -nbreg- adding individual dummies to control for fixed effects. Despite computational demanding since I have many individuals, I managed to run the regression in one of my panels for the main sample and also smaller subsamples of interest. Stata did not have any problem converging to optimal results.
My problem starts when I run the exact (the exact!) same model specification for the second panel. Stata does not converge to optimal results. In fact, Stata gets stuck in a non-concave area of likelihood function. I decide to add each variable separately to verify who was the villain. Stata runs the model perfectly until I add i.ID. The moment I add the individual dummies, is the moment Stata fails to maximize the likelihood function. I tried to take a random subset from the main sample hoping that a smaller sample size would make things easier to Stata. No success. I tried to change the iteration mode. I tested them all (nr, bhhh, dfp and bfgs) and none worked. I even tried to relax the maximization assumptions through -tolerate- but my problem is, indeed, lack of concavity.
Additionally, I tried -ppml- to check if my specification had a "maximizable" likelihood function. While -ppml- does not allow me to include factor variables, it did seem that the specification was alright and the villain was, indeed, i.ID. I also read Santos Silva & Tenreyro (2010, 2011) and tried to correct the scaling and magnitude of the covariates - though I do not think this was a problem in my dataset - but, again, no success on convergence.
In closing, it does not seem I will be able to reach results through this method for the second panel. With all that considered, what would be the second-best approach? Could I just use -xtpoisson, fe- and argue that this is a good approximation of -nbreg- with individual dummies? If so, what would be the best references for me to motivate such choice?
Mind you that I do not want to compute probabilities for particular counts. My main interest is to observe direction and significance of the covariates, specially the exogenous shock. So long I can trust the p-values, the direction and the size of the coefficients relative their counterparts in the regression, I am happy.
I would like to thank in advance for any further input.
Best,
\igor
-----
References:
Allison, P.D. and Waterman, R.P., 2002. 7. Fixed-Effects Negative Binomial Regression Models. Sociological methodology, 32(1), pp.247-265.
Greene, W., 2007. Functional form and heterogeneity in models for count data. Foundations and Trends in Econometrics, 1(2), pp.113-218.
Guimaraes, P., 2008. The fixed effects negative binomial model revisited. Economics Letters, 99(1), pp.63-66.
Silva, J.S. and Tenreyro, S., 2010. On the existence of the maximum likelihood estimates in Poisson regression. Economics Letters, 107(2), pp.310-312.
Santos Silva, J. and Tenreyro, S., 2011. poisson: Some convergence issues.
I have two panels, both of which with thousands of individuals (ID) followed through a 35-year period. It has a substantial number of observations in both datasets. Ultimately, I want to estimate the effect of a given exogenous shock on year t=T on the possession of a given asset that is measured through a count variable. I am conducting the estimates separately for each panel.
Since I am working with longitudinal data, it seemed adequate to control for individual's fixed effects using a negative binomial regression. However, after reading about -xtnbreg, fe-, it became clear to me through the works of Allison and Waterman (2002), Greene (2007) and Guimaraes (2008) that this command does not deliver a fixed effect estimation to satisfaction. Apparently, using -xtnbreg, fe- makes a model vulnarable to the "incidental parameters bias".
The most straighforward solution to the problem is to run -nbreg- adding individual dummies to control for fixed effects. Despite computational demanding since I have many individuals, I managed to run the regression in one of my panels for the main sample and also smaller subsamples of interest. Stata did not have any problem converging to optimal results.
My problem starts when I run the exact (the exact!) same model specification for the second panel. Stata does not converge to optimal results. In fact, Stata gets stuck in a non-concave area of likelihood function. I decide to add each variable separately to verify who was the villain. Stata runs the model perfectly until I add i.ID. The moment I add the individual dummies, is the moment Stata fails to maximize the likelihood function. I tried to take a random subset from the main sample hoping that a smaller sample size would make things easier to Stata. No success. I tried to change the iteration mode. I tested them all (nr, bhhh, dfp and bfgs) and none worked. I even tried to relax the maximization assumptions through -tolerate- but my problem is, indeed, lack of concavity.
Additionally, I tried -ppml- to check if my specification had a "maximizable" likelihood function. While -ppml- does not allow me to include factor variables, it did seem that the specification was alright and the villain was, indeed, i.ID. I also read Santos Silva & Tenreyro (2010, 2011) and tried to correct the scaling and magnitude of the covariates - though I do not think this was a problem in my dataset - but, again, no success on convergence.
In closing, it does not seem I will be able to reach results through this method for the second panel. With all that considered, what would be the second-best approach? Could I just use -xtpoisson, fe- and argue that this is a good approximation of -nbreg- with individual dummies? If so, what would be the best references for me to motivate such choice?
Mind you that I do not want to compute probabilities for particular counts. My main interest is to observe direction and significance of the covariates, specially the exogenous shock. So long I can trust the p-values, the direction and the size of the coefficients relative their counterparts in the regression, I am happy.
I would like to thank in advance for any further input.
Best,
\igor
-----
References:
Allison, P.D. and Waterman, R.P., 2002. 7. Fixed-Effects Negative Binomial Regression Models. Sociological methodology, 32(1), pp.247-265.
Greene, W., 2007. Functional form and heterogeneity in models for count data. Foundations and Trends in Econometrics, 1(2), pp.113-218.
Guimaraes, P., 2008. The fixed effects negative binomial model revisited. Economics Letters, 99(1), pp.63-66.
Silva, J.S. and Tenreyro, S., 2010. On the existence of the maximum likelihood estimates in Poisson regression. Economics Letters, 107(2), pp.310-312.
Santos Silva, J. and Tenreyro, S., 2011. poisson: Some convergence issues.
Comment