Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • choosing xtpoisson options

    Hi. I’ve been reading up on xtpoisson on Statalist and in Rabe-Heketh & Skrondal’s book “Multilevel and Long. Modeling using Stata” (older edition). However, I feel I still don’t have a complete grasp of what options to use and how to examine post-estimation predictions. So, I’m looking for general advice on what to read and to know if I’m headed in the right direction.

    The goal of this research is to describe the association between the number of forms filed at a government agency with (1) policy options that some states (United States) implemented or changed over the periods observed and (2) interventions by advocates to increase agency compliance with federal laws meant to make the forms more accessible.

    The variables:
    • forms: The DV is the sum of forms submitted in a state over a two-year period. Over six biennial periods, 13 states are observed. In four periods, one state is missing data (a different state three of those times).
    Variable Obs Mean Std. Dev. Min Max
    forms 74 310924.5 316870.4 2505 1682350
    • statefip: state id code; 13 states.
    • period: the two-year periods are coded from one to six
    • pool100k: the number of people participating in the program in the even-numbered year of the period divided by 100,000. The number of forms filed in a year doesn’t equal the number eligible because program participants do not need to file each year. Thus, this is a proxy for state population size and state characteristics (i.e., not all states have the same degree of demand for the benefits).
    • renewal: number of years before new forms must be filed; ranges from four to 10; four of the 13 states increased the length over time; none reduce the length; longer periods should decrease the dependent variable
    • policy: binary for an optional policy that might increase accessibility to the form (in 2008, two states out of the 13 had the policy and by 2018, five states did); state-periods with this policy should be associated with a higher dependent variable
    • intervention: mutually exclusive intervention periods, ranging from 0= no external intervention in state-period; 1 = advocates warn state about non-compliance early in that period; 2= the state-period is covered by improvements made by the state after advocates pointed out non-compliance; 3=state-period covered by litigation settlement to improve compliance with federal law; for a few states two years after settlements expired are coded as under settlement based on evidence that states didn’t change the policies settlements altered; other states have not yet exited settlements or had experienced the milder intervention.
    Only those 13 states with evidence that they ever faced an intervention over the 12 years are included. I.e., the comparison is between the number of forms in state periods that may have had poor compliance to the number of forms in state periods during an intervention. Additional states might be added if we confirm they experienced interventions and when.

    Here is what I ran in Stata 15.1:

    Code:
    xtset statefip period
    Code:
    xtpoisson forms i.period c.pool100k c.renewal i.policy i.interven, pa vce(robust) i(statefip)
    The results show a statistically and substantively significant association in the hypothesized directions for all predictors in the model except for some of the early years.
    .
    Questions:
    1. From the description, should the model by PA, RE, or FE? I am familiar with FE when using xtreg but it sounds like it isn’t comparable to FE in xtpoisson.
    2. Since the dependent variable is the absolute count of forms, I assume I don’t need an offest or exposure option. Is that right?
    3. Is the corr(exchangeable) the right option here (the default)?
    4. Sponsors of this evaluation would be much more comfortable with reporting the number of forms, as opposed to IRR. Post-estimation, the margins i.interven, contrast(eff) command gives predicted results that are reasonable from looking at the data and based on running the model as xtreg…, fe i(statefip) cluster(statefip). Is the margins command after xtpoisson giving me results in the number of forms?
    5. If FE or RE are better specifications of the model, how do I get from IRR or coefficients back to units as forms? I tried following advice given in other threads but I wasn't getting units.
    6. Finally, are there other books or lectures on xtpoisson I might benefit from reading? The Stata manual seems to have a higher learning curve than I would like, so some other readings for reassurance or examples would help.
    Thank you all.

  • #2
    Happy New Year. Just following up on this. I'm reading the late Joseph Hilbe's "Modeling Count Data" (Cambridge University Press) which I wanted to mention as others may find it helpful. Based on that (or my understanding of the advice in there) and some trial-and-error, I've decided that perhaps -xtnbreg ..., pa vce(robust) i(statefip) is the way to go. I would appreciate any thoughts on the appropriateness of the PA option for this research question. Let me know if I can provide more information. Thank you.

    Comment


    • #3
      There are plenty of books on count data. Hilbe's book is a great start though if you're really wanting the details of count data analysis. There's also Econometric Analysis of Count Data by Winklemann and plenty of others. There's also Hilbe's Negative Binomial Regression or Generalized Linear Models by Hilbe and Scott Long.

      The FE estimator is appropriate whenever you think there's unobserved heterogeneity, which is almost always the case. Although, it's an empirical question as to if a RE model is better. I'd need to read again about the corr option you cite. Offsets are always appropriate when you're studying a rate, but if not then no offset is necessary and can likely be ignored. I never use margins, so I can't comment on its efficacy.

      However, and I think it's almost obligatory for me to say this, the main thing I would concern myself with would be the design of your study. Yes, the parametric estimator we use matters and having the basic stats down is important, but to study the effects of policy (as it seems like you're interested in), how you design the paper is far more important than whatever analytic approach you take. How are you designing your analysis, difference-in-differences, interrupted time series..?

      For me (or anyone) to have much say on that though, we'd need example data with the dataex command. Doug Hess

      Comment


      • #4
        Originally posted by Jared Greathouse View Post
        However, and I think it's almost obligatory for me to say this, the main thing I would concern myself with would be the design of your study. Yes, the parametric estimator we use matters and having the basic stats down is important, but to study the effects of policy (as it seems like you're interested in), how you design the paper is far more important than whatever analytic approach you take. How are you designing your analysis, difference-in-differences, interrupted time series..?

        For me (or anyone) to have much say on that though, we'd need example data with the dataex command. Doug Hess
        Thanks, Jared. For now, I'm running cross-sectional time-series models. Thus, I'm wondering if -xtnbreg- is appropriate and how to consider -pa- vs -fe- or other options. I've not considered difference-in-differences with multiple categorical levels of an intervention (treatment) before.

        Here's the full dataset. (I've dropped, for now, the "renewal" variable that I mentioned above because it's not clear what lag I should use when the renewal period changes and there are only six years. I've included "ratio" as a variable, but rather use the count variable "forms" because it's not clear that the ratio is meaningful until I reconsider the impact of the renewal period.)

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte statefip int year byte policy long(forms pool) byte(interven period) float(ratio pool100k)
         1 2008 0   32132  3753550 0 1  .008560429   37.5355
         1 2010 0   14232  3805751 0 2 .0037396036  38.05751
         1 2012 0   23023  3827522 0 3  .006015119  38.27522
         1 2014 0   10031  3881542 0 4  .002584282  38.81542
         1 2016 0  396281  3943082 3 5   .10050032  39.43082
         1 2018 0 1070491  3999057 3 6   .26768586  39.99057
         4 2008 0  410132  4315579 0 1   .09503522  43.15579
         4 2010 0  704035  4443647 0 2    .1584363  44.43647
         4 2012 0  394446  4697579 0 3   .08396793  46.97579
         4 2014 0  639948  4881801 0 4   .13108851  48.81801
         4 2016 0  768978  5082305 0 5   .15130498  50.82305
         4 2018 0  888201  5284970 2 6    .1680617   52.8497
         6 2008 0 1592764 23697667 0 1   .06721185 236.97667
         6 2010 0  608765 23753441 0 2    .0256285  237.5344
         6 2012 1  703751 24200997 0 3   .02907942 242.00996
         6 2014 1  854031 24813346 0 4   .03441821 248.13345
         6 2016 1  694209 26199436 2 5    .0264971 261.99435
         6 2018 1 1932780 27039400 3 6   .07148014   270.394
         8 2008 0  647411  3605682 0 1     .179553  36.05682
         8 2010 0  642975  3779273 0 2    .1701319  37.79273
         8 2012 0  469786  3807673 0 3   .12337877  38.07673
         8 2014 1  325857  3883362 0 4   .08391105  38.83362
         8 2016 1  468901  4066580 0 5   .11530598   40.6658
         8 2018 1  782426  4244713 1 6   .18432954  42.44713
         9 2008 0   33428  2883324 0 1  .011593563  28.83324
         9 2010 0   20317  2934576 0 2  .006923317  29.34576
         9 2012 0   20537  2485708 0 3  .008262033  24.85708
         9 2014 1   26551  2542588 0 4   .01044251  25.42588
         9 2016 1  180240  2611007 2 5  .069030836  26.11007
         9 2018 1  283609  2605612 3 6   .10884544  26.05612
        29 2008 0  174385  4196682 0 1   .04155307  41.96682
        29 2010 0  244438  4246249 0 2   .05756563  42.46249
        29 2012 0  268191  4288488 0 3  .062537424  42.88488
        29 2014 0  253058  4295224 0 4   .05891614  42.95224
        29 2016 0  288438  4249579 0 5   .06787449  42.49579
        29 2018 0  242475  4272960 2 6   .05674638   42.7296
        30 2008 1   25699   738982 0 1   .03477622   7.38982
        30 2010 1   28198   743611 0 2   .03792036   7.43611
        30 2012 1   36534   757812 0 3   .04820985   7.57812
        30 2014 1   26853   768703 0 4  .034932867   7.68703
        30 2016 1   56547   797145 1 5    .0709369   7.97145
        30 2018 1   70739   806204 1 6    .0877433   8.06204
        32 2008 0  117648  1678550 0 1  .070089065   16.7855
        32 2010 0   39061  1691318 0 2  .023095006  16.91318
        32 2012 0  138368  1728060 0 3    .0800713   17.2806
        32 2014 0   71961  1796443 0 4   .04005749  17.96443
        32 2016 0  136014  1872376 2 5   .07264246  18.72376
        32 2018 0  143882  1983453 3 6   .07254117  19.83453
        34 2008 0  195773  5782155 0 1   .03385814  57.82155
        34 2010 0  132352  5952583 0 2   .02223438  59.52583
        34 2012 0  520206  6039623 0 3    .0861322  60.39623
        34 2014 0       .  6152634 0 4           .  61.52634
        34 2016 0  867555  6238436 1 5    .1390661  62.38436
        34 2018 0 1152284  6342876 1 6   .18166585  63.42876
        35 2008 0    2765  1365249 2 1 .0020252715  13.65249
        35 2010 0    3383  1405926 3 2  .002406243  14.05926
        35 2012 0   24572  1430475 3 3  .017177511  14.30475
        35 2014 0   37411  1444857 3 4   .02589253  14.44857
        36 2008 0       . 11284545 0 1           . 112.84545
        36 2010 0  578320 11285830 0 2   .05124302  112.8583
        36 2012 0  638065 11248617 0 3   .05672386 112.48617
        36 2014 0  410307 11318198 0 4  .036251973 113.18198
        36 2016 0  825007 11947568 0 5  .069052294 119.47568
        36 2018 0 1232349 12194360 3 6   .10105893  121.9436
        37 2008 1  759954  6457000 0 1    .1176946     64.57
        37 2010 1  506608  6536601 0 2   .07750328  65.36601
        37 2012 1  616206  6677693 0 3   .09227828  66.77693
        37 2014 1  537088  7025333 0 4   .07645018  70.25333
        37 2016 1 1108923  7267042 2 5    .1525962  72.67042
        37 2018 1 1264821  7509231 3 6    .1684355  75.09231
        40 2008 0  188312  2301848 0 1   .08180905  23.01848
        40 2010 0   94443  2348718 0 2   .04021045  23.48718
        40 2012 0  144183  2400358 0 3   .06006729  24.00358
        40 2014 0   84461  2451972 0 4   .03444615  24.51972
        40 2016 0  231263  2498178 0 5   .09257267  24.98178
        40 2018 0  249945  2504253 1 6    .0998082  25.04253
        end
        label values interven interven
        label def interven 0 "Normal", modify
        label def interven 1 "Tech Asst", modify
        label def interven 2 "Letter", modify
        label def interven 3 "Agreement", modify

        Comment


        • #5
          With such large counts there may be no need for a nonlinear model. If you use one, it should be xtpoisson with fixed effects and vce(robust). But you can use log(y) in a linear fixed effects estimation as a comparison. You can then include log(pool) as an explanatory variable. Because you do have a natural upper bound (pool) for y (forms), in principle you should use binomial regression. But it's harder to do a fixed effects type analysis. With a pretty small N, I would start with log(forms) and log(pool) and use fixed effects -- assuming your intervention variable changes over time.

          Comment


          • #6
            Originally posted by Jeff Wooldridge View Post
            With such large counts there may be no need for a nonlinear model. If you use one, it should be xtpoisson with fixed effects and vce(robust). But you can use log(y) in a linear fixed effects estimation as a comparison. You can then include log(pool) as an explanatory variable. Because you do have a natural upper bound (pool) for y (forms), in principle you should use binomial regression. But it's harder to do a fixed effects type analysis. With a pretty small N, I would start with log(forms) and log(pool) and use fixed effects -- assuming your intervention variable changes over time.
            Thank you. After running -xtpoisson forms i.year c.pool i.policy i.interven, fe vce(robust) i(statefip)- how do I convert the predicted outcome back to the original unit?
            Also, I notice no constant in the results. Is that normal?
            Last edited by Doug Hess; 05 Jan 2022, 13:26.

            Comment


            • #7
              Originally posted by Jeff Wooldridge View Post
              With such large counts there may be no need for a nonlinear model. If you use one, it should be xtpoisson with fixed effects and vce(robust). But you can use log(y) in a linear fixed effects estimation as a comparison. You can then include log(pool) as an explanatory variable. Because you do have a natural upper bound (pool) for y (forms), in principle you should use binomial regression. But it's harder to do a fixed effects type analysis. With a pretty small N, I would start with log(forms) and log(pool) and use fixed effects -- assuming your intervention variable changes over time.
              Here's my analysis code using -reg- and then -xtreg-. I included -reg- because I don't know how to get from the coefficients in -xtreg- with the log of the dep variable back to a count of forms. I also don't know how to check residuals with coef in log form. Ideally, I'd also like to check the contrast between the intervention levels of "technical assistance" vs "agreement."

              Thanks.

              Code:
              gen lnforms=ln(forms)
              gen lnpool=ln(pool)
              reg lnforms i.statefip i.year c.lnpool i.policy* i.interven, cluster(statefip)
                  predict yhat if e(sample)
                  replace yhat = exp(yhat) if e(sample)
                  replace yhat = yhat*exp(e(rmse)^2/2) if e(sample)
              
              sum yhat pool ln* if e(sample)

              [output for reg, predict, and replace cmds omitted]
              Variable Obs Mean Std. Dev. Min Max
              yhat 84 583184.5 676680.2 5649.503 3184063
              pool 84 6014102 6452892 738982 2.72e+07
              lnforms 84 12.32938 1.527221 7.924796 14.91323
              lnpool 84 15.20726 .8809382 13.51303 17.11923
              Code:
              xtset statefip year
              xtreg lnforms i.year c.lnpool i.policy* i.inter, cluster(statefip) fe
              Fixed-effects (within) regression Number of obs = 84
              Group variable: statefip Number of groups = 13
              R-sq: Obs per group:
              within = 0.6045 min = 4
              between = 0.7049 avg = 6.5
              overall = 0.4173 max = 7
              F(12,12) = 36.50
              corr(u_i, Xb) = -0.9762 Prob > F = 0.0000
              (Std. Err. adjusted for 13 clusters in statefip)
              ( Coef. for years omitted)


              Robust
              lnforms Coef. Std. Err. t P>|t| [95% Conf. Interval]
              lnpool -4.273417 2.020297 2.12 0.056 -8.675266 .1284321
              1.policy1 -.3265637 .3595665 0.91 0.382 -1.109992 .4568644
              1.policy2 .6061889 .2532976 2.39 0.034 .0543007 1.158077
              inter
              Tech Asst .8046703 .2888487 2.79 0.016 .175323 1.434018
              Letter .2817154 .2597435 1.08 0.299 -.284217 .8476479
              Agreement 1.400928 .5139866 2.73 0.018 .281048 2.520809
              _cons 76.7993 30.68278 2.50 0.028 9.947257 143.6513
              sigma_u 5.149558
              sigma_e .5817182
              rho .98739977 (fraction of variance due to u_i)
              Last edited by Doug Hess; 05 Jan 2022, 16:44.

              Comment


              • #8
                What question are you trying to answer? How come you're interested in fitted values? Isn't this a policy analysis?

                To first order, a linear model for log(forms) is similar to an exponential model for forms. So compare the xtreg above to the following:

                Code:
                xtset statefip year
                xtpoisson forms i.year c.lnpool i.policy* i.inter, fe vce(robust)
                The coefficients on the policy variables seem large in the log-linear model, so they might not be so similar to the exponential model estimated by Poisson FE. But I'm guessing they're similar.

                Comment


                • #9
                  I'm trying to answer the question: What effect did the policy changes (made by states voluntarily) and the interventions (imposed by advocates by litigation or threat of litigation) have? Thus, I need to report the impact in the number of forms.

                  Code:
                  xtpoisson forms i.year c.lnpool i.policy* i.inter, fe vce(robust)
                  Iteration 0: log pseudolikelihood = -6921699.5
                  Iteration 1: log pseudolikelihood = -2195604.1
                  Iteration 2: log pseudolikelihood = -2010131.7
                  Iteration 3: log pseudolikelihood = -2009868.4
                  Iteration 4: log pseudolikelihood = -2009868.4
                  Conditional fixed-effects Poisson regression Number of obs = 84
                  Group variable: statefip Number of groups = 13
                  Obs per group:
                  min = 4
                  avg = 6.5
                  max = 7
                  Wald chi2(12) = 1.33e+10
                  Log pseudolikelihood = -2009868.4 Prob > chi2 = 0.0000
                  (Std. Err. adjusted for clustering on statefip)
                  Robust
                  forms Coef. Std. Err. z P>z [95% Conf. Interval]
                  year
                  2010 -.1966024 .2522316 -0.78 0.436 -.6909673 .2977626
                  2012 .0697461 .189955 0.37 0.713 -.3025588 .442051
                  2014 .1323234 .259001 0.51 0.609 -.3753091 .639956
                  2016 .725338 .4066185 1.78 0.074 -.0716196 1.522296
                  2018 .8284978 .5896272 1.41 0.160 -.3271504 1.984146
                  2020 1.036247 .5799894 1.79 0.074 -.1005114 2.173005
                  lnpool -3.392932 2.993424 -1.13 0.257 -9.259934 2.474071
                  1.policy1 -.5430835 .1191101 -4.56 0.000 -.7765349 -.3096321
                  1.policy2 .4928936 .1145222 4.30 0.000 .2684341 .717353
                  inter
                  Tech Asst .2773952 .1710067 1.62 0.105 -.0577717 .6125622
                  Letter -.0121658 .1546375 -0.08 0.937 -.3152496 .290918
                  Agreement .3247397 .1463048 2.22 0.026 .0379877 .6114918

                  Comment


                  • #10
                    To be clearer, the funders of the research will want findings in "number of forms." They won't know how to interpret--if anybody does--coefficients from xtpoisson.

                    Also, thank you for your help with this. I'm posting on Facebook that a "famous author" is responding to my questions. : )
                    Last edited by Doug Hess; 05 Jan 2022, 17:07.

                    Comment


                    • #11
                      The coefficient in either log-linear or exponential would have a percentage interpretation, so it can be stated that, "on average, the number of filed forms went up xx percent. But if you want it on number of forms, definitely use xtpoisson and the margins command on the policy variables.

                      Comment


                      • #12
                        Originally posted by Jeff Wooldridge View Post
                        The coefficient in either log-linear or exponential would have a percentage interpretation, so it can be stated that, "on average, the number of filed forms went up xx percent. But if you want it on number of forms, definitely use xtpoisson and the margins command on the policy variables.
                        Thanks. But that is the problem, after -xtpoisson, fe- using margins interv, predict(nu0) gives the statements: "numerical derivatives are approximate" and "nearby values are missing." The margin table also includes bizarre results:
                        Margin Std. Err. z P>z [95% Conf. Interval]
                        inter
                        Normal 9.60e-22 3.92e-20 0.02 0.980 -7.59e-20 7.78e-20
                        Tech Asst 1.27e-21 5.19e-20 0.02 0.981 -1.00e-19 1.03e-19
                        Letter 9.48e-22 3.87e-20 0.02 0.980 -7.50e-20 7.69e-20
                        Agreement 1.33e-21 5.42e-20 0.02 0.980 -1.05e-19 1.08e-19

                        Comment


                        • #13
                          Dear Doug Hess,

                          The problem is that margins cannot meaningfully be used after this command. I have been asking Stata to deal with this for some time now, but without success:

                          Originally posted by Joao Santos Silva View Post
                          As discussed here, margins should not be available after nonlinear models with fixed effects are estimated. The explanation for that is simple: any interesting quantity that we may want to compute will depend on the value of the fixed effects, which are not estimated by these commands. Therefore, margins computes something that most of the times is meaningless. This could be done in a future update, but at least it would be good to have this looked into in the next version.
                          Your example provides another illustration of the problem and hopefully this time someone at StataCorp will look into this.

                          Best wishes,

                          Joao

                          Comment


                          • #14
                            Originally posted by Joao Santos Silva View Post
                            Dear Doug Hess,
                            The problem is that margins cannot meaningfully be used after this command. I have been asking Stata to deal with this for some time now, but without success...
                            Your example provides another illustration of the problem and hopefully this time someone at StataCorp will look into this.
                            Thanks, Joao. The documentation and the examples given in the Stata manual need elaboration. I'm hoping there's a solution in Hilbe's book.
                            Can I ask when it's appropriate to use the PA option instead of FE?

                            Comment


                            • #15
                              In the case of multiplicative heterogeneity -- as with the FE Poisson setting -- calculation of average partial effects can make sense. My former student, Robert Martin ( now at BLS) did part of his dissertation on this. In particular, it is possible to estimate the mean of the heterogeneity and then replace c(i) [the notation that I use] with E[c(i)]. Bob has a couple of ways of doing that. What Joao is referring to, I think, is inserting estimates of c(i) in place of the c(i). These are poor estimates because of small T. However, if they are averaged across i they become good estimates of E[c(i)]. I should look into how Stata is using margins after xtpoisson.

                              Another possibility is to use a correlated random effects approach and then apply pooled Poisson. That would be essentially the PA option, except you include the time averages of the x(i,t) variables. So like this:

                              Code:
                              egen x1bar = mean(x1), by(statefip)
                              egen xKbar = mean(xK), by(statefip)
                              poisson y x1 ... xK z1 ... ZJ x1bar ... xKbar i.year, vce(cluster statefip)
                              margins, dydx(*)
                              If we replace poisson with reg this would give the usual linear FE estimates. There would be none of the problem that Joao is referring to. Hopefully the coefficients on the xj are similar to Poisson FE.

                              Comment

                              Working...
                              X