choosing xtpoisson options

Doug Hess

Join Date: Nov 2016

Posts: 56
#1

choosing xtpoisson options

27 Dec 2021, 20:02

Hi. I’ve been reading up on xtpoisson on Statalist and in Rabe-Heketh & Skrondal’s book “Multilevel and Long. Modeling using Stata” (older edition). However, I feel I still don’t have a complete grasp of what options to use and how to examine post-estimation predictions. So, I’m looking for general advice on what to read and to know if I’m headed in the right direction.

The goal of this research is to describe the association between the number of forms filed at a government agency with (1) policy options that some states (United States) implemented or changed over the periods observed and (2) interventions by advocates to increase agency compliance with federal laws meant to make the forms more accessible.

The variables:
forms: The DV is the sum of forms submitted in a state over a two-year period. Over six biennial periods, 13 states are observed. In four periods, one state is missing data (a different state three of those times).

Variable Obs Mean Std. Dev. Min Max
forms 74 310924.5 316870.4 2505 1682350
statefip: state id code; 13 states.

period: the two-year periods are coded from one to six

pool100k: the number of people participating in the program in the even-numbered year of the period divided by 100,000. The number of forms filed in a year doesn’t equal the number eligible because program participants do not need to file each year. Thus, this is a proxy for state population size and state characteristics (i.e., not all states have the same degree of demand for the benefits).

renewal: number of years before new forms must be filed; ranges from four to 10; four of the 13 states increased the length over time; none reduce the length; longer periods should decrease the dependent variable

policy: binary for an optional policy that might increase accessibility to the form (in 2008, two states out of the 13 had the policy and by 2018, five states did); state-periods with this policy should be associated with a higher dependent variable

intervention: mutually exclusive intervention periods, ranging from 0= no external intervention in state-period; 1 = advocates warn state about non-compliance early in that period; 2= the state-period is covered by improvements made by the state after advocates pointed out non-compliance; 3=state-period covered by litigation settlement to improve compliance with federal law; for a few states two years after settlements expired are coded as under settlement based on evidence that states didn’t change the policies settlements altered; other states have not yet exited settlements or had experienced the milder intervention.

Only those 13 states with evidence that they ever faced an intervention over the 12 years are included. I.e., the comparison is between the number of forms in state periods that may have had poor compliance to the number of forms in state periods during an intervention. Additional states might be added if we confirm they experienced interventions and when.

Here is what I ran in Stata 15.1:

Code:

xtset statefip period

Code:

xtpoisson forms i.period c.pool100k c.renewal i.policy i.interven, pa vce(robust) i(statefip)

The results show a statistically and substantively significant association in the hypothesized directions for all predictors in the model except for some of the early years.
.
Questions:
From the description, should the model by PA, RE, or FE? I am familiar with FE when using xtreg but it sounds like it isn’t comparable to FE in xtpoisson.

Since the dependent variable is the absolute count of forms, I assume I don’t need an offest or exposure option. Is that right?

Is the corr(exchangeable) the right option here (the default)?

Sponsors of this evaluation would be much more comfortable with reporting the number of forms, as opposed to IRR. Post-estimation, the margins i.interven, contrast(eff) command gives predicted results that are reasonable from looking at the data and based on running the model as xtreg…, fe i(statefip) cluster(statefip). Is the margins command after xtpoisson giving me results in the number of forms?

If FE or RE are better specifications of the model, how do I get from IRR or coefficients back to units as forms? I tried following advice given in other threads but I wasn't getting units.

Finally, are there other books or lectures on xtpoisson I might benefit from reading? The Stata manual seems to have a higher learning curve than I would like, so some other readings for reassurance or examples would help.

Thank you all.
Tags: None
Doug Hess

Join Date: Nov 2016

Posts: 56
#2

04 Jan 2022, 12:20

Happy New Year. Just following up on this. I'm reading the late Joseph Hilbe's "Modeling Count Data" (Cambridge University Press) which I wanted to mention as others may find it helpful. Based on that (or my understanding of the advice in there) and some trial-and-error, I've decided that perhaps -xtnbreg ..., pa vce(robust) i(statefip) is the way to go. I would appreciate any thoughts on the appropriateness of the PA option for this research question. Let me know if I can provide more information. Thank you.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#3

04 Jan 2022, 18:59

There are plenty of books on count data. Hilbe's book is a great start though if you're really wanting the details of count data analysis. There's also Econometric Analysis of Count Data by Winklemann and plenty of others. There's also Hilbe's Negative Binomial Regression or Generalized Linear Models by Hilbe and Scott Long.

The FE estimator is appropriate whenever you think there's unobserved heterogeneity, which is almost always the case. Although, it's an empirical question as to if a RE model is better. I'd need to read again about the corr option you cite. Offsets are always appropriate when you're studying a rate, but if not then no offset is necessary and can likely be ignored. I never use margins, so I can't comment on its efficacy.

However, and I think it's almost obligatory for me to say this, the main thing I would concern myself with would be the design of your study. Yes, the parametric estimator we use matters and having the basic stats down is important, but to study the effects of policy (as it seems like you're interested in), how you design the paper is far more important than whatever analytic approach you take. How are you designing your analysis, difference-in-differences, interrupted time series..?

For me (or anyone) to have much say on that though, we'd need example data with the dataex command. Doug Hess
1 like
Comment

Doug Hess

Join Date: Nov 2016
Posts: 56

04 Jan 2022, 19:26

Originally posted by Jared Greathouse View Post

However, and I think it's almost obligatory for me to say this, the main thing I would concern myself with would be the design of your study. Yes, the parametric estimator we use matters and having the basic stats down is important, but to study the effects of policy (as it seems like you're interested in), how you design the paper is far more important than whatever analytic approach you take. How are you designing your analysis, difference-in-differences, interrupted time series..?

For me (or anyone) to have much say on that though, we'd need example data with the dataex command. Doug Hess

Thanks, Jared. For now, I'm running cross-sectional time-series models. Thus, I'm wondering if -xtnbreg- is appropriate and how to consider -pa- vs -fe- or other options. I've not considered difference-in-differences with multiple categorical levels of an intervention (treatment) before.

Here's the full dataset. (I've dropped, for now, the "renewal" variable that I mentioned above because it's not clear what lag I should use when the renewal period changes and there are only six years. I've included "ratio" as a variable, but rather use the count variable "forms" because it's not clear that the ratio is meaningful until I reconsider the impact of the renewal period.)

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte statefip int year byte policy long(forms pool) byte(interven period) float(ratio pool100k)
 1 2008 0   32132  3753550 0 1  .008560429   37.5355
 1 2010 0   14232  3805751 0 2 .0037396036  38.05751
 1 2012 0   23023  3827522 0 3  .006015119  38.27522
 1 2014 0   10031  3881542 0 4  .002584282  38.81542
 1 2016 0  396281  3943082 3 5   .10050032  39.43082
 1 2018 0 1070491  3999057 3 6   .26768586  39.99057
 4 2008 0  410132  4315579 0 1   .09503522  43.15579
 4 2010 0  704035  4443647 0 2    .1584363  44.43647
 4 2012 0  394446  4697579 0 3   .08396793  46.97579
 4 2014 0  639948  4881801 0 4   .13108851  48.81801
 4 2016 0  768978  5082305 0 5   .15130498  50.82305
 4 2018 0  888201  5284970 2 6    .1680617   52.8497
 6 2008 0 1592764 23697667 0 1   .06721185 236.97667
 6 2010 0  608765 23753441 0 2    .0256285  237.5344
 6 2012 1  703751 24200997 0 3   .02907942 242.00996
 6 2014 1  854031 24813346 0 4   .03441821 248.13345
 6 2016 1  694209 26199436 2 5    .0264971 261.99435
 6 2018 1 1932780 27039400 3 6   .07148014   270.394
 8 2008 0  647411  3605682 0 1     .179553  36.05682
 8 2010 0  642975  3779273 0 2    .1701319  37.79273
 8 2012 0  469786  3807673 0 3   .12337877  38.07673
 8 2014 1  325857  3883362 0 4   .08391105  38.83362
 8 2016 1  468901  4066580 0 5   .11530598   40.6658
 8 2018 1  782426  4244713 1 6   .18432954  42.44713
 9 2008 0   33428  2883324 0 1  .011593563  28.83324
 9 2010 0   20317  2934576 0 2  .006923317  29.34576
 9 2012 0   20537  2485708 0 3  .008262033  24.85708
 9 2014 1   26551  2542588 0 4   .01044251  25.42588
 9 2016 1  180240  2611007 2 5  .069030836  26.11007
 9 2018 1  283609  2605612 3 6   .10884544  26.05612
29 2008 0  174385  4196682 0 1   .04155307  41.96682
29 2010 0  244438  4246249 0 2   .05756563  42.46249
29 2012 0  268191  4288488 0 3  .062537424  42.88488
29 2014 0  253058  4295224 0 4   .05891614  42.95224
29 2016 0  288438  4249579 0 5   .06787449  42.49579
29 2018 0  242475  4272960 2 6   .05674638   42.7296
30 2008 1   25699   738982 0 1   .03477622   7.38982
30 2010 1   28198   743611 0 2   .03792036   7.43611
30 2012 1   36534   757812 0 3   .04820985   7.57812
30 2014 1   26853   768703 0 4  .034932867   7.68703
30 2016 1   56547   797145 1 5    .0709369   7.97145
30 2018 1   70739   806204 1 6    .0877433   8.06204
32 2008 0  117648  1678550 0 1  .070089065   16.7855
32 2010 0   39061  1691318 0 2  .023095006  16.91318
32 2012 0  138368  1728060 0 3    .0800713   17.2806
32 2014 0   71961  1796443 0 4   .04005749  17.96443
32 2016 0  136014  1872376 2 5   .07264246  18.72376
32 2018 0  143882  1983453 3 6   .07254117  19.83453
34 2008 0  195773  5782155 0 1   .03385814  57.82155
34 2010 0  132352  5952583 0 2   .02223438  59.52583
34 2012 0  520206  6039623 0 3    .0861322  60.39623
34 2014 0       .  6152634 0 4           .  61.52634
34 2016 0  867555  6238436 1 5    .1390661  62.38436
34 2018 0 1152284  6342876 1 6   .18166585  63.42876
35 2008 0    2765  1365249 2 1 .0020252715  13.65249
35 2010 0    3383  1405926 3 2  .002406243  14.05926
35 2012 0   24572  1430475 3 3  .017177511  14.30475
35 2014 0   37411  1444857 3 4   .02589253  14.44857
36 2008 0       . 11284545 0 1           . 112.84545
36 2010 0  578320 11285830 0 2   .05124302  112.8583
36 2012 0  638065 11248617 0 3   .05672386 112.48617
36 2014 0  410307 11318198 0 4  .036251973 113.18198
36 2016 0  825007 11947568 0 5  .069052294 119.47568
36 2018 0 1232349 12194360 3 6   .10105893  121.9436
37 2008 1  759954  6457000 0 1    .1176946     64.57
37 2010 1  506608  6536601 0 2   .07750328  65.36601
37 2012 1  616206  6677693 0 3   .09227828  66.77693
37 2014 1  537088  7025333 0 4   .07645018  70.25333
37 2016 1 1108923  7267042 2 5    .1525962  72.67042
37 2018 1 1264821  7509231 3 6    .1684355  75.09231
40 2008 0  188312  2301848 0 1   .08180905  23.01848
40 2010 0   94443  2348718 0 2   .04021045  23.48718
40 2012 0  144183  2400358 0 3   .06006729  24.00358
40 2014 0   84461  2451972 0 4   .03444615  24.51972
40 2016 0  231263  2498178 0 5   .09257267  24.98178
40 2018 0  249945  2504253 1 6    .0998082  25.04253
end
label values interven interven
label def interven 0 "Normal", modify
label def interven 1 "Tech Asst", modify
label def interven 2 "Letter", modify
label def interven 3 "Agreement", modify

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#5

05 Jan 2022, 07:30

With such large counts there may be no need for a nonlinear model. If you use one, it should be xtpoisson with fixed effects and vce(robust). But you can use log(y) in a linear fixed effects estimation as a comparison. You can then include log(pool) as an explanatory variable. Because you do have a natural upper bound (pool) for y (forms), in principle you should use binomial regression. But it's harder to do a fixed effects type analysis. With a pretty small N, I would start with log(forms) and log(pool) and use fixed effects -- assuming your intervention variable changes over time.
1 like
Comment
Doug Hess

Join Date: Nov 2016

Posts: 56
#6

05 Jan 2022, 13:02

Originally posted by Jeff Wooldridge View Post

With such large counts there may be no need for a nonlinear model. If you use one, it should be xtpoisson with fixed effects and vce(robust). But you can use log(y) in a linear fixed effects estimation as a comparison. You can then include log(pool) as an explanatory variable. Because you do have a natural upper bound (pool) for y (forms), in principle you should use binomial regression. But it's harder to do a fixed effects type analysis. With a pretty small N, I would start with log(forms) and log(pool) and use fixed effects -- assuming your intervention variable changes over time.

Thank you. After running -xtpoisson forms i.year c.pool i.policy i.interven, fe vce(robust) i(statefip)- how do I convert the predicted outcome back to the original unit?
Also, I notice no constant in the results. Is that normal?

Last edited by Doug Hess; 05 Jan 2022, 13:26.
Comment

Doug Hess

Join Date: Nov 2016
Posts: 56

05 Jan 2022, 16:30

Originally posted by Jeff Wooldridge View Post

With such large counts there may be no need for a nonlinear model. If you use one, it should be xtpoisson with fixed effects and vce(robust). But you can use log(y) in a linear fixed effects estimation as a comparison. You can then include log(pool) as an explanatory variable. Because you do have a natural upper bound (pool) for y (forms), in principle you should use binomial regression. But it's harder to do a fixed effects type analysis. With a pretty small N, I would start with log(forms) and log(pool) and use fixed effects -- assuming your intervention variable changes over time.

Here's my analysis code using -reg- and then -xtreg-. I included -reg- because I don't know how to get from the coefficients in -xtreg- with the log of the dep variable back to a count of forms. I also don't know how to check residuals with coef in log form. Ideally, I'd also like to check the contrast between the intervention levels of "technical assistance" vs "agreement."

Thanks.

Code:

gen lnforms=ln(forms)
gen lnpool=ln(pool)
reg lnforms i.statefip i.year c.lnpool i.policy* i.interven, cluster(statefip)
    predict yhat if e(sample)
    replace yhat = exp(yhat) if e(sample)
    replace yhat = yhat*exp(e(rmse)^2/2) if e(sample)

sum yhat pool ln* if e(sample)

[output for reg, predict, and replace cmds omitted]

Variable	Obs	Mean	Std. Dev.	Min	Max

yhat	84	583184.5	676680.2	5649.503	3184063
pool	84	6014102	6452892	738982	2.72e+07
lnforms	84	12.32938	1.527221	7.924796	14.91323
lnpool	84	15.20726	.8809382	13.51303	17.11923

Code:

xtset statefip year
xtreg lnforms i.year c.lnpool i.policy* i.inter, cluster(statefip) fe

Fixed-effects (within) regression	Number of obs	=	84
Group variable: statefip	Number of groups	=	13
R-sq:	Obs per group:
within = 0.6045	min	=	4
between = 0.7049	avg	=	6.5
overall = 0.4173	max	=	7
	F(12,12)	=	36.50
corr(u_i, Xb) = -0.9762	Prob > F	=	0.0000

(Std. Err. adjusted for 13 clusters in statefip)
( Coef. for years omitted)

Robust
lnforms Coef. Std. Err. t P>|t| [95% Conf. Interval]

lnpool	-4.273417	2.020297	2.12 0.056	-8.675266	.1284321
1.policy1	-.3265637	.3595665	0.91 0.382	-1.109992	.4568644
1.policy2	.6061889	.2532976	2.39 0.034	.0543007	1.158077

inter
Tech Asst	.8046703	.2888487	2.79 0.016	.175323	1.434018
Letter	.2817154	.2597435	1.08 0.299	-.284217	.8476479
Agreement	1.400928	.5139866	2.73 0.018	.281048	2.520809

_cons	76.7993	30.68278	2.50 0.028	9.947257	143.6513

sigma_u	5.149558
sigma_e	.5817182
rho	.98739977	(fraction	of	variance due	to	u_i)

Last edited by Doug Hess; 05 Jan 2022, 16:44.

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#8

05 Jan 2022, 16:47

What question are you trying to answer? How come you're interested in fitted values? Isn't this a policy analysis?

To first order, a linear model for log(forms) is similar to an exponential model for forms. So compare the xtreg above to the following:

Code:

xtset statefip year xtpoisson forms i.year c.lnpool i.policy* i.inter, fe vce(robust)

The coefficients on the policy variables seem large in the log-linear model, so they might not be so similar to the exponential model estimated by Poisson FE. But I'm guessing they're similar.
Comment

Doug Hess

Join Date: Nov 2016
Posts: 56

05 Jan 2022, 16:51

I'm trying to answer the question: What effect did the policy changes (made by states voluntarily) and the interventions (imposed by advocates by litigation or threat of litigation) have? Thus, I need to report the impact in the number of forms.

Code:

xtpoisson forms i.year c.lnpool i.policy* i.inter, fe vce(robust)


Iteration 0: log pseudolikelihood = -6921699.5
Iteration 1: log pseudolikelihood = -2195604.1
Iteration 2: log pseudolikelihood = -2010131.7
Iteration 3: log pseudolikelihood = -2009868.4
Iteration 4: log pseudolikelihood = -2009868.4
Conditional fixed-effects Poisson regression Number of obs =	84
Group variable: statefip Number of groups =	13
Obs per group:
min =	4
avg =	6.5
max =	7
Wald chi2(12) =	1.33e+10
Log pseudolikelihood = -2009868.4 Prob > chi2 =	0.0000
(Std. Err. adjusted for clustering on	statefip)

Robust
forms Coef. Std. Err. z P>z [95% Conf.	Interval]

year
2010 -.1966024 .2522316 -0.78 0.436 -.6909673	.2977626
2012 .0697461 .189955 0.37 0.713 -.3025588	.442051
2014 .1323234 .259001 0.51 0.609 -.3753091	.639956
2016 .725338 .4066185 1.78 0.074 -.0716196	1.522296
2018 .8284978 .5896272 1.41 0.160 -.3271504	1.984146
2020 1.036247 .5799894 1.79 0.074 -.1005114	2.173005
lnpool -3.392932 2.993424 -1.13 0.257 -9.259934	2.474071
1.policy1 -.5430835 .1191101 -4.56 0.000 -.7765349	-.3096321
1.policy2 .4928936 .1145222 4.30 0.000 .2684341	.717353
inter
Tech Asst .2773952 .1710067 1.62 0.105 -.0577717	.6125622
Letter -.0121658 .1546375 -0.08 0.937 -.3152496	.290918
Agreement .3247397 .1463048 2.22 0.026 .0379877	.6114918

Comment

Doug Hess

Join Date: Nov 2016

Posts: 56
#10

05 Jan 2022, 16:53

To be clearer, the funders of the research will want findings in "number of forms." They won't know how to interpret--if anybody does--coefficients from xtpoisson.

Also, thank you for your help with this. I'm posting on Facebook that a "famous author" is responding to my questions. : )

Last edited by Doug Hess; 05 Jan 2022, 17:07.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#11

05 Jan 2022, 17:17

The coefficient in either log-linear or exponential would have a percentage interpretation, so it can be stated that, "on average, the number of filed forms went up xx percent. But if you want it on number of forms, definitely use xtpoisson and the margins command on the policy variables.
1 like
Comment

Doug Hess

Join Date: Nov 2016
Posts: 56

#12

05 Jan 2022, 19:48

Originally posted by Jeff Wooldridge View Post

The coefficient in either log-linear or exponential would have a percentage interpretation, so it can be stated that, "on average, the number of filed forms went up xx percent. But if you want it on number of forms, definitely use xtpoisson and the margins command on the policy variables.

Thanks. But that is the problem, after -xtpoisson, fe- using margins interv, predict(nu0) gives the statements: "numerical derivatives are approximate" and "nearby values are missing." The margin table also includes bizarre results:

	Margin	Std. Err.	z	P>z	[95% Conf.	Interval]

inter
Normal	9.60e-22	3.92e-20	0.02	0.980	-7.59e-20	7.78e-20
Tech Asst	1.27e-21	5.19e-20	0.02	0.981	-1.00e-19	1.03e-19
Letter	9.48e-22	3.87e-20	0.02	0.980	-7.50e-20	7.69e-20
Agreement	1.33e-21	5.42e-20	0.02	0.980	-1.05e-19	1.08e-19

Comment

Joao Santos Silva

Join Date: Apr 2014

Posts: 2962
#13

06 Jan 2022, 01:36

Dear Doug Hess,

The problem is that margins cannot meaningfully be used after this command. I have been asking Stata to deal with this for some time now, but without success:

Originally posted by Joao Santos Silva View Post

As discussed here, margins should not be available after nonlinear models with fixed effects are estimated. The explanation for that is simple: any interesting quantity that we may want to compute will depend on the value of the fixed effects, which are not estimated by these commands. Therefore, margins computes something that most of the times is meaningless. This could be done in a future update, but at least it would be good to have this looked into in the next version.

Your example provides another illustration of the problem and hopefully this time someone at StataCorp will look into this.

Best wishes,

Joao
Comment
Doug Hess

Join Date: Nov 2016

Posts: 56
#14

06 Jan 2022, 09:08

Originally posted by Joao Santos Silva View Post

Dear Doug Hess,
The problem is that margins cannot meaningfully be used after this command. I have been asking Stata to deal with this for some time now, but without success...
Your example provides another illustration of the problem and hopefully this time someone at StataCorp will look into this.

Thanks, Joao. The documentation and the examples given in the Stata manual need elaboration. I'm hoping there's a solution in Hilbe's book.
Can I ask when it's appropriate to use the PA option instead of FE?
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2081
#15

06 Jan 2022, 13:27

In the case of multiplicative heterogeneity -- as with the FE Poisson setting -- calculation of average partial effects can make sense. My former student, Robert Martin ( now at BLS) did part of his dissertation on this. In particular, it is possible to estimate the mean of the heterogeneity and then replace c(i) [the notation that I use] with E[c(i)]. Bob has a couple of ways of doing that. What Joao is referring to, I think, is inserting estimates of c(i) in place of the c(i). These are poor estimates because of small T. However, if they are averaged across i they become good estimates of E[c(i)]. I should look into how Stata is using margins after xtpoisson.

Another possibility is to use a correlated random effects approach and then apply pooled Poisson. That would be essentially the PA option, except you include the time averages of the x(i,t) variables. So like this:

Code:

egen x1bar = mean(x1), by(statefip) egen xKbar = mean(xK), by(statefip) poisson y x1 ... xK z1 ... ZJ x1bar ... xKbar i.year, vce(cluster statefip) margins, dydx(*)

If we replace poisson with reg this would give the usual linear FE estimates. There would be none of the problem that Joao is referring to. Hopefully the coefficients on the xj are similar to Poisson FE.
1 like
Comment

Announcement