You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
You can never be sure of that, but you can test whether there is evidence of misspecification. The easiest way is perhaps to do a RESET test. After you estimate the model, get the fitted values with predict and the option xb, square them, and estimate your model again adding the new variable to the list of regressors. If the coefficient on the squared fitted values is significant (using suitable standard errors), that is evidence of misspecification.
Joao Santos Silva: The p-value on the predicted values squared comes in at 0.15 to 0.20 depending on the model I run, so I am glad that there is only weak evidence of misspecification.
I plotted the predicted values and the actual values, and it appears that xtpoisson fails to predict the vast majority of zeros even though I have many variables with good p-values, why might this be?
Also, I remember you mentioned that the goodness-of-fit measures are not relevant as long as the conditional expectation assumption is satisfied, but if you had to choose among a variety of different versions of the same model fit with xtpoisson, how would you choose?
My research involves investigating the effect of the lag period for each independent variable and selecting the optimal model, so comparing models is of great interest to me. xtpoisson does not have prepackaged goodness-of-fit calculations as far as I can tell. There is no estat gof or something similar.
I saw on another post you said that AIC and BIC are not valid or useful in this context, is that indeed the case here?
Regarding #18, I am not sure how you are obtaining the predictions because for that you would need the fixed effects that are not estimated. Anyway, the model will never predict zeros; it will only predict small values.
About #19, just estimate a model with all the variables and use t and F tests to drop the ones that are not relevant. Indeed, AiC and Bic and not useful in this context because they are likelihood based.
On #19, got it! I will try that. Taking it a step further, how would you select between two relevant competing models? The best t or F-test between the two? I was considering minimizing the deviance, but it sounds like that might not be the right approach. There is the Wald chi2, I could maximize that. Then there is the log pseudolikelihood.
Predict with the option xb only gives you the fitted values of the linear index, not a prediction of the outcome. You cannot get a valid prediction of the outcome with this method.
Do not use the deviance in this context. if you want to choose between a model with regressor a and a model with regressor b, estimate a model with a and b and use a t test to see if you can drop either of them. If you want to do this with sets of regressors, use f tests to do it.
Hi Joao Santos Silva, thanks for your help with this, I really appreciate it!
Would the deviance still be a bad choice if I used the non-normalized version of my counts? They are integers and range from 0-6, with 97% of the time-series observations at zero.
Does the Poisson approach care if the observations within each entity are autocorrelated? Some of my entities display very mild and sporadic autocorrelation with lags out to 30 days, but the vast majority of the entities have no autocorrelation.
With respect to your recommendation in #22, wouldn't a t-test or f-test depend on some sort of normality assumption?
Also, something that crossed my mind is the following: if the fixed effects Poisson regression is incapable of making valid predictions in this context, then why are the incident rate ratios and associated p-values valid? It seems like the two would go hand-in-hand.
Hi Joao Santos Silva, I have now read though the relevant chapters in Woolridge's textbook [1], and I have a question for you.
I understand that Poisson and gamma regression are "fully robust to distributional misspecification other than the conditional mean" [1], and that Poisson is the most efficient of these. I am now in the process of performing specification testing on my models.
However, my dependent variable is a corner solutions response, as you immediately diagnosed in your first post on this thread. Why doesn't Woolridge's textbook mention Poisson regression as an appropriate strategy for a corner solutions response? He goes through Tobit (Types I and II), two-part models, the truncated normal hurdle model, the lognormal hurdle moel, and the two-limit Tobit model.
[1] Wooldridge JM. Econometric Analysis of Cross Section and Panel Data (The MIT Press). second edition. Cambridge, Mass: The MIT Press; 2010:1096.
Only Jeff can answer that, but I guess that is because at the time the book was written Poisson regression was not popular in this context. Judging by the discussions in this forum (for example, here), the next edition may cover it.
Best wishes,
Joao
PS: To be precise, Poisson PML may or may not be more efficient than gamma PML, but only Poisson PML is valid with fixed effects.
I am reading Jeff Woolridge's chapter ("Quasi-Likelihood Estimators for Count Data") in [1], and I notice that he says in Section III ("Panel Data Methods") the methods in that section have nice statistical properties for panel datasets where N (number of independent variables) is much larger than T (the number of time observations); the Fixed Effects Poisson Estimator is included in that section. Should I be concerned that my data has T = 943 and N = 10-20? He also mentions that it may work well for in other cases, but that doesn't reassure me!
Also, is stationarity of the time series aspect of the panel variables important? I read in [2] that stationarity is a commonly accepted indicator that the time series is weakly persistent, which is an assumption of OLS regression on time series (though the commonly accepted indicator is not always right). I have not seen stationarity discussed in the context of Poisson regression on panel data in the sources I have reviewed so far, though I may have missed it.
[1] Pesaran MH, Schmidt P, eds. Handbook of Applied Econometrics Volume II: Microeconomics. Oxford, UK: Blackwell Publishing Ltd; 1999. doi:10.1111/b.9780631216339.1999.x
[2] Dougherty C. Introduction to Econometrics. 3rd ed. Oxford: Oxford University Press; 2007:480.
Having T much larger than N does not affect the properties of the Poisson FE estimator, but it makes regular Poisson with dummies for each unit almost as appealing (they give the same results, but using the second approach is not practical when N is large).
With large T stationarity is always an issue; that is true for Poisson or any other form of regression. So, you need to consider that, and see if it is a problem in your application.
Comment