Using Panel Dummies for Fixed Effects Negative Binomial

Imran Khan

Join Date: Sep 2017

Posts: 68
#1

Using Panel Dummies for Fixed Effects Negative Binomial

16 Dec 2017, 12:04

Hi,

I have a panel dataset of 50 countries over 10 years. The dependant variable is a count and given the overdispersion in the data I am running a negative binomial regression in Stata 15/MP. There is a potential structural break in the dependent variable occurring in the year 2005. Moreover, given the heterogeneous set of countries I am tempted to use fixed effects (I also couldn’t compare between fixed and random effects because to the best of my knowledge Hausman test cannot be performed in this setting).

What I have understood so far is that for using a negative binomial model with fixed effects, panel dummies have to be included instead of using the -fe- command.

Given all this background, I am using the following commands:

Code:

xtset country year nbreg DV IV 2005.year i.year i.country, vce (cluster country)

I have some confusions related to the above commands which I am not able to figure out despite of reading the literature and Stata manual.

Q1) Am I using the right command?
Q2) Can I use both 2005.year and i.year together?
Q3) I want to cluster the standard error by countries as it is increasing the significance of the required coefficients. However, upon using the -vce (cluster clustvar)- command , ‘log likelihood’ is being changed to ‘log pseudolikelihood’ and the Stata output is giving a missing value in front of the following statistics:

Code:

Wald chi2(20) = . Prob > chi2 = .

Q4) Which statistic will tell me if my overall model is good?

Could someone help me in understanding the above questions?

Best reagrds,
Imran Khan
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29951
#2

16 Dec 2017, 12:28

Q1. There is an -xtnbreg, fe- command. But it is not a conditional fixed-effects estimator the way, say, -xtlogit- and -xtpoisson- are. Allison does recommend doing what you propose in your post rather than using that. I don't know enough about this to give you a clear answer to this question, and perhaps others who do will chime in.

Q2. Yes. What will happen is that in the list of outputs for all the year indicators, 2005 will appear as "omitted", and the separate 2005.year indicator will be there. That's because you listed 2005.year first. Although it is not promised in the documentation, it appears that when duplicate or colinear variables are in the varlist, Stata breaks the colinearity by dropping those which appear later in the varlist.

Q3. Choosing which variance estimator to use on the basis of giving you preferred results is not science and you should not do it. You should determine whether there is need to cluster your variance estimate based on substantive issues of study design. If you do end up using the clustered VCE, remember that it reduces your residual degrees of freedom to the number of clusters minus 1. Since you have 50 countries, with the structure you have given your model you have far more than 50 predictor variables. Consequently the number of variables in the model exceeds the residual degrees of freedom and it is not possible to test the hypothesis that all coefficients are simultaneously zero. That is why you are getting missing values for the chi square statistic and its associated p-value.

Q4. In any case, the chi square statistic is not a measure of the goodness of your model. Were it available to you, all it would do is test the joint null hypothesis that all predictor coefficients are zero. So it isn't telling you that your model is good; it's just saying that your model is, literally, better than nothing. A very low bar, indeed. If you want to know if your model is a good fit to your data, you should generate predicted values and compare them to the observed values. I tend to prefer graphical comparisons to summary statistics for this purpose.
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#3

17 Dec 2017, 23:35

Dear Clyde,

Many thanks for your reply. I have understood your answers for Q1 and Q2. However, I wish someone might chime in to give an even clear answer to Q1 since I am consuming many degrees of freedom in estimating individual and year wise fixed effects.

As far as the answer to Q3 is concerned, I have got the idea now of why I am getting missing values for the chi square statistic and its associated p-value.
The rationale behind choosing to cluster the variance was each country is repeated for 10 years and for every year error might be same as the country is same. Does that justify the use of clustering standard errors by countries?

Moving on to Q4, could you tell me if I am using the right command to generate predict values? If yes, then should I plot the predicted values against the dependent variable?

Code:

predict newvar, xb

Looking forward to your guidance.

Best regards,
Imran Khan.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#4

18 Dec 2017, 06:22

Adding to Clyde's comprehansive advice, I would say that estimating a NegBin model with country dummies is not recommended because the estimator suffers form the incidental parameter problem (at least I am not aware of any proof that it does not). My suggestion is that you stick to -xtpoisson- with FE because this estimator is very robust and it does not suffer from the IPP.

Best wishes,

Joao
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29951
#5

18 Dec 2017, 08:01

Following -nbreg-, the -predict- option for getting predicted number of outcomes is -predict newvar, n-. If you follow Joao Santos Silva's advice and switch to -xtpoisson-, you would be best off with -predict newvar, nu0-.

With regard to whether clustering is appropriate to your data, the fact that you have the same countries and repeated observations doesn't necessarily call for clustering. What was the sampling design? Are these 50 countries selected from among the "universe" of all countries, or do they constitute the census of a group of countries of interest. If selected, were they selected at random, or was it systematic/deterministic? Also relevant is what your independent variable is and whether or not you expect heterogeneity of its effect on the outcome across countries.
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#6

18 Dec 2017, 11:24

Dear Jao,

Many thanks for your reply.

Firstly, I was tempted to use negative binomial as there was an evidence of overdispersion in my data. Would you still suggest me to use -xtpoisson- with FE? If yes, then how can I justify the use of poisson in my case where there does exist overdispersion?

Secondly, the reason behind using country dummies in the negative binomial model is the Alison and Waterman paper. Here is the abstract for the Allison & Waterman paper:

"This paper demonstrates that the conditional negative binomial model for panel data, proposed by Hausman, Hall, and Griliches (1984), is not a true fixed-effects method. This method which has been implemented in both Stata and LIMDEP-does not in fact control for all stable covariates. Three alternative methods are explored. A negative multinomial model yields the same estimator as the conditional Poisson estimator and hence does not provide any additional leverage for dealing with overdispersion. On the other hand, a simulation study yields good results from applying an unconditional negative binomial regression estimator with dummy variables to represent the fixed effects. There is no evidence for any incidental parameters bias in the coefficients, and downward bias in the standard error estimates can be easily and effectively corrected using the deviance statistic. Finally, an approximate conditional method is found to perform at about the same level as the unconditional estimator.”

Allison, Paul D. and Richard Waterman (2002) "Fixed effects negative binomial regression models." In Ross M. Stolzenberg (ed.), Sociological Methodology 2002. Oxford: Basil Blackwell.

Initially, I planned to use -xtnbreg- with FE. However, the following answer from a thread on Stata corp caught my attention:

"Typically for a fixed effects negative binomial model, you would want to use the -xtnbreg, fe- command. -xtnbreg, fe- is fitting a conditional fixed effects model. When you include panel dummies in -nbreg- command, you are fitting an unconditional fixed effects model. For nonlinear models such as the negative binomial model, the unconditional fixed effects estimator produces inconsistent estimates. This is caused by the incidental parameters problem. See the following references for theoretical aspects on the incidental parameters problem"

I look forward to your guidance.

Best regards,
Imran Khan.
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#7

18 Dec 2017, 11:38

Dear Clyde,

I have completely understood how to generate the predicted values.

I included all those African countries for which the data was available. In other words, they were the countries of interest and definitely not selected at random. My independent variables are the economic and political conditions of a country and hence, I expect some heterogeneity of its effect on the outcome variable.

Do the above two points justify clustering standard errors by countries?
Moreover, If I were to include all the countries in the world for which data is available (not only Africa but also not selected at random), should I still cluster standrad errors by countries?

Best regards,
Imran Khan.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29951
#8

18 Dec 2017, 12:01

I didn't ask my questions correctly. The question sampling design was correctly asked and your answer is helpful. I should not have asked about heterogeneity of effect--that's a separate issue unrelated to whether to cluster standard errors. What I should have asked you is whether the value of your principal independent variable(s) exhibit clustering within countries. If the answer to that is yes, then clustered standard errors would be appropriate. If not, then you should not cluster. If you want the gory statistical details, see https://economics.mit.edu/files/13927.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#9

18 Dec 2017, 15:24

Dear Imran,

You do not say what you want to do with your model, but unless you want to compute probabilities of events it is likely that you can safely ignore overdispersion and opt for an estimator that is quite robust to distributional assumptions. If you use Poisson regression (with robust standard errors), you can explicitly include the dummies in the model (as you are doing for the NB) and obtain the predictions directly form the model.

I am aware of the Allison and Waterman (2002) paper, but as far as I understand their results do not prove that there is no IPP in the negative binomial model; the favourable results they obtain in the simulations may be specific to the particular simulation design they considered.

Indeed, the -xtnbreg- with FE does not do what I believe you want to do, so it is better not to use it. Can you please give the link for the quote you provided?

Best wishes,

Joao
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#10

19 Dec 2017, 08:10

Dear Clyde,

Many thanks for your reply. I have read the document and although I found some parts of it difficult to understand but overall it was very helpful.

The principal independant variable is foreign aid and the amount of foreign aid keeps on fluctuating depending on the foreign policy of the donors as well as the economic and political conditions of the recipient countries (panel variable). Given this background, if I have understood your question correctly, I can expect that the principal independent variable (foreign aid) exhibit clustering within countries (for instance, if a country has been hit by a natural disaster, it would be receiving high amounts of aid during that period or if donors loose interest in some of the recipient countries, the volume of aid would decline for those recipient countries)

Does this explanation further justify the clustering of standard errors by countries along with the explanation that I have selected all those aid-recipient countries for which the data is available?

Best regards,
Imran khan.
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#11

19 Dec 2017, 08:11

Dear Joao,

I am interested in estimating the impact of foreign aid coming from some specific donors on number of foreign aid conditions imposed on recipient countries. In addition, I am not interested in putting the dummies in the model. I only wanted to control for fixed effects and the reason I put country and year dummies in the negative binomial model was my understanding that -nbreg- with FE is not a true fixed effect model. I am wondering if you could provide a reference through which I can support the idea of safely ignoring overdispersion?

Moreover, could you hi-light some of the Stata commands to check the goodness of fit of the model other than -poisgof-? Do you think it is a good idea to check the goodness of fit of the model to further support the use of poisson instead of a negative binomial model?

The link from which the quote has been taken is as follows:

https://www.stata.com/statalist/arch.../msg00398.html

I look foward to your response.

Best regards,
Imran Khan.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29951
#12

19 Dec 2017, 08:57

Re #10: This is admittedly a difficult problem to understand, but, with my limited understanding of the subject matter here (not my field at all), I'm inclined to agree with you that clustering the errors on country would be appropriate here.
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3000
#13

19 Dec 2017, 15:39

Thanks for providing the link. As for the reference, please check Wooldridge's book. Finally, I would not worry about the g-o-f; just do a RESET test.

Best wishes,

Joao
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#14

20 Dec 2017, 15:49

Dear Clyde,

Many thanks for your reply.
Your replies have been very helpful in clearing the confusions I had related to my model.

Best regards,
Imran Khan
Comment
Imran Khan

Join Date: Sep 2017

Posts: 68
#15

20 Dec 2017, 15:51

Dear Joao,

Many thanks for your advices through out.
As per your suggestion, my next step would be to perform the RESET test.

Best regards,
Imran Khan.
Comment

Announcement

Using Panel Dummies for Fixed Effects Negative Binomial

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment