Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I interpret the result of zero-inflated Poisson regression?

    Click image for larger version

Name:	ZIP_result.PNG
Views:	1
Size:	15.2 KB
ID:	1424111

    Hi, I used the zero-inflated Poisson model to estimate the impact of the (satisfaction) Level1,2,3 and the (satisfaction) SD1,2,3 on the number of complaints from the hotel stay.
    In more detail, I want to see the interaction effect of the Level and SD as well as the main effect.

    * The numbers 1, 2, 3 after the Level and SD variable indicate different source of satisfaction, which cannot be clarified here for some reason.

    I chose to use the model because the dependent variable, the number of complaints, has lots of 0s and the excessive zeros are thought of having two parts: no complaint / not reporting the complaint.

    The questions is,

    1. Is it appropriate to use the ZIP model?

    2. Do I just interpret the result like:
    - all type of the Level has a negative effect on the number of complaint
    - all type of the SD has a positive effect on the number of complaint
    - for type 1 and 3, SD mitigates the negative impact of Level on the number of complaint

    3. If I do not care about the 0 outcome, is it okay to ignore the inflate part?


  • #2
    1. Perhaps. If you are only interested in hypothesis testing effects (which I infer may be your interest because you are using significance stars and have chosen to effectively ignore the Level2*SD2 term), then just using a Poisson model (no zero inflation) with robust standard errors might be a simpler way to approach it and would be just as good. On the other hand, if you are actually interested in predicting complaint rates, then niceties like whether it is really zero-inflation or some other model, perhaps a negative binomial, or something even more complicated matters. For that you need to look at how the model actually fits the data to decide if your model is appropriate.

    2. No, nothing like that at all. This is an interaction model. So the key point is that in an interaction model there is no such thing as "the effect of Level1," nor "the effect of SD1." Rather there is a different "effect of Level1" corresponding to each possiblue value of SD1.

    Let's just take Level1 and SD1 as an example. (Identical logic applies to Level2~SD2 and Level3~SD3.) You have to understand that in this model, the coefficient of Level1 represents the "effect" of Level1 on the (log of the) number of complaints when SD1 = 0. Similarly the coefficient of SD1 represents the "effect" of SD1 when Level1 = 0. You cannot say that either Level1 or SD1 has a positive or a negative effect because the effect of Level1 depends on SD1 and the effect of SD1 depends on Level1 and the result can be positive or negative depending on which values of Level1 and SD1 we are talking about. So, for example, when Level1 = 1 and SD1 = 0, the marginal effect of Level1 on the log of the number of complaints is -0.122, which is negative. But if Level1 = 1 and SD1 = 5, then the marginal effect of Level1 on the log of the number of complaints is -0.122 + 5*0.030 = +0.028 which is positive. More generally, at any value of SD1 = S, the marginal effect of Level1 on the log of the number of complaints is going to be -0.122* + 0.030*S.

    With regard to the lack of statistical significance of the coefficient of the Level2*SD2 term, that does not mean you can ignore it or pretend it is zero. While that is bad statistical practice even in non-interaction models, it is particularly bad here. Notice that this coefficient, 0.016 is only slightly smaller than the "statistically significant" coefficient of Level3*SD3, 0.019. So the impact that SD2 has on the marginal effect of Level2 is only slightly smaller than the one that SD3 has on the effect of Level 3. In particular, if SD2 can take on large values, 0.016*SD2 can be huge compared to -0.091, the coefficient of Level2, so that the marginal effect of Level2 may well be quite sensitive to the value of SD2 even though this coefficient is not "statistically significant." A better interpretation is that the interaction between Level2 and SD2 is not precisely determined by the data, and that while -0.091 is our best estimate, there is enough sampling and measurement error going on that even values in the opposite direction are still reasonably consistent with the data. But we certainly cannot pretend that the interaction is 0; we can only say that it is known so imprecisely that we cannot be confident of its sign. If knowing it more precisely is important, then one would have to have more data, better data, or perhaps some other design.

    Continuous by continuous interactions are difficult to grasp, and all the more so in a non-linear model such as this one. I advise you to go back and re-run your regression using factor variable notation (-help fvvarlist-) in the regression, and then evaluate the interactions using the -margins- command. Select some values of Level1 and SD1 (and similarly for 2 and 3) that are interesting or meaningful in your data--you would know what those are. Then evaluate the predicted number of complaints at all combinations of those values, and also the marginal effects of Level1 and of SD1 at each of those. Use -marginsplot- to visualize them. So something like this:

    Code:
    zip complaints other_covariates c.Level1##c.SD1 c.Level2##c.SD2 c.Level3##c.SD3, inflate(inflate_model_variables)
    margins, at(Level1 = (interesting_values_of_Level1) SD1 = (interesting_values_of_SD1))
    marginsplot, name(predicted_outcomes, replace)
    margins, dydx(Level1) at (SD1 = (interesting_values_of_SD1))
    marginsplot, name(level1_marginal_effects, replace)
    margins, dydx(SD1) at(Level1 = (interesting_values_of_Level1))
    marginsplot, name(sd1_marginal_effects, replace)
    Do similar -margins- and -marginsplot- commands for Level2~SD2 and Level3~SD3

    Note that the predicted values and marginal effects you get from this will be the predicted values and marginal effects on the number of complaints, not the logarithm.

    Do read -help fvvarlist- to understand factor variable notation and how it works. For a very readable introduction to the -margins- command, I suggest you start with Richard Williams' excellent https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It includes clear explanations and several worked examples, including interaction models. After you have that down, you can learn about some of the more advanced features of -margins- by reading the corresponding section of the PDF documentation that is installed with Stata.

    3. If you have no interest in the process that generates 0 complaints, then, yes, feel free to ignore the inflate results.

    Comment


    • #3
      Thank you so much for the helpful suggestions.
      But I have the error message that marginsplot command is unrecognized and I think it is because I have stata11.
      Is there any other way to present the graphics? or should I get the newest version?

      Comment


      • #4
        I don't have Stata 11 around anymore. In Stata 14 and 15 (and maybe earlier) the -margins- command has an undocumented -saving()- option that lets you save the margins output in a .dta file, which you can then -use- and you can plot the results using the usual graph commands. Whether this is available in Stata 11, I do not recall.

        If you are planning to use Stata regularly going forward and you do analyses of the type you describe here with some frequency, then it would probably be well worth your while to upgrade to the newest version (15.1).

        Comment


        • #5
          Irrespective of your needs to upgrade, -coefplot- (written by Ben Jann and available on SSC) is compatible with Stata 11. The hyperlink goes to a section of his website which describes plotting from margins. Note that you must use the -post- option when plotting from margins.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment

          Working...
          X