Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is negative binomial appropriate

    Suppose you have a bunch of people enrolled in a study and you would like to generate model that can describe the rate of money per subject over a specified time line.

    At the start of the study you randomly assign the subjects into groups of equal sizes and give different amounts of money based on the group.

    Group 1: $1000
    Group 2: $2000
    Group 3: $3000

    The participants can do whatever they want with the money, but they cannot use their own money during the study. If they choose to “invest” the money, then they can use those additional gains (e.g. lottery, gambling).

    As the researcher you will sample a subset of each group at fixed time points along the study. However, once sampled the subjects are removed from the study. Additional covariates are recorded, such as location of purchase, cost of transaction, day sampled, group.

    The outcomes once sampled will be a net gain, net loss, or no net change compared to the starting amount of money. The outcomes are over dispersed.

    Is it appropriate to use nbreg? If nbreg is appropriate, should an offset/exposure be used to account for the different starting conditions or different lengths of time a participant is in the study?

  • #2
    The first thing that is unclear to me is what the outcome variable is here. In ordinary circumstances, the outcome variable in such a study would be the amount of money gained or lost (or 0 if no change). But what you state implies that you are just observing that there is a gain or loss (or neither) but not the amount. That sounds like bad study design to me--it throws away an enormous amount of information. If it's not too late to change the study design, you should do so and observe the actual amount of gain or loss (as well as whether it is a gain or loss). Have you ever noticed that professional sports statisticians and election statisticians do not analyze wins and losses, they analyze point spreads? There's a really good reason for that.

    Next, I don't know what you mean when you say "The outcomes are overdispersed." The concept of overdispersion is relative to some anticipated distribution, most commonly the Poisson distribution. But if your outcome variable is just +, -, or 0, then this is nothing like a Poisson variable, so I don't know what distribution you might be referencing when you refer to overdispersion. In fact I can't even imagine what distribution this might refer to in this context.

    If you can modify the study design to observe magnitudes as well as signs of the investment results, then I would probably think in terms of an ordinary linear regression, perhaps with some transformation of the outcome variable if need be.

    If the data are already gathered and it is too late to retrieve the magnitude information from study records, then I would tend to view this as an ordinal variable and my first approach would be to use ordinal logistic regression. But I wouldn't be optimistic about getting insightful results from such an information-depleted outcome variable.

    In neither case do I see how -nbreg- would be applicable here.

    Comment


    • #3
      Hello Dr. Schechter,

      Thanks for the response. The study I have described was actually my best attempt at describing a bacterial decay study. A known concentration of bacteria was applied onto multiple plants to measure the rate of decay over time. A subset of plants were sampled daily. The concentration of bacteria was quantified (this would be my count outcome variable). Some plants showed signed of bacterial growth (bacterial concentrations above initial incoulum levels), while some plants had bacterial decay (bacterial concentrations below initial inoculum). Either way, the bacterial concentration was determined per plant. The recovered bacterial concentrations ranged up to 8 logs on any given day.

      The otucome variable is the count of bacteria. Predictor variables includes a quadratic term with time, trial (proxy for season), inoculum type, and leaf wetness.

      Negative binomial regression is a generalized linear regression model where the dependent outcome variable is a count of the number of times an event occurs. I have a count of change in bacterial concentrations at a given time interval. However, I'm not sure that this is the appropriate model. Or which model is?

      Your help is greatly appreciated.

      Comment


      • #4
        Well, even though the context is quite different from what #1 evoked, I think my approach would still be the same. I think you would be much better off modeling the bacterial counts themselves than modeling an artificial "event" defined by the direction (or absence) of change in the bacterial count. It might make sense to use a generalized linear model with a log-link for this given that the counts range over 8 orders of magnitude. So I'd still be thinking about something like:

        Code:
        glm bacteria_concentration i.group perhaps_other_covariates, link(log)
        margins group // FOR EXPECTED CONCENTRATIONS IN EACH GROUP
        It's not clear to me whether all plants are sampled after the same amount of time after being "seeded". If they are not, then I would presume that this some how needs to be reflected in the analysis. Perhaps the -exposure()- option of -glm- would be suitable here, though it is not the only, nor necessarily the best, way to handle this issue. That's more of a substantive question that I'm not able to advise you on.

        Other approaches are also possible. One might use final bacteria_concentration divided by initial inoculum as the outcome variable. There the model would be similar, though I wold expect this variable would not be so wide ranging as the previous one, so the log link might be superfluous.

        That said, I can imagine that what I am proposing above may miss the point, as you have not made your research goals clear. It may be that you are studying some sort of mechanism that these plants have to control the growth of this kind of criterion and you are trying to identify properties of that mechanism to ascertain whether, for example, there is some threshold inoculum below which the mechanism doesn't kick in, or perhaps a threshold inoculum above which it gets overwhelmed and fails. In that case, I could understand wanting to consider "growth" or "decay" or "stasis" as the actual outcomes of interest. In that case, again, I would think of ordinal logistic regression. I would code an outcome variable as 1, 2, or 3 for decay, stasis, and growth, respectively and then run

        Code:
        ologit outcome i.group perhaps other covariates
        margins group // FOR PREDICTED PROBABILITIES OF EACH OUTCOME IN EACH GROUP
        I hope this is helpful
        Last edited by Clyde Schechter; 03 Jun 2018, 10:15.

        Comment


        • #5
          Hello Dr. Schechter,
          I appreciate your response using the glm method, as this is consistent with the methods I've been exploring. My main concern was the difference in direction of the outcome from the starting conditions. Assuming that my data follow a Poisson process, for example number of car accidents at a site, I cannot wrap my head around reporting a negative number of accidents, or in my case less bacteria compared starting conditions while also observing more bacteria than starting conditions. If the event of the Poisson process is the number of bacteria observed by growth, then I could simply count from my starting concentration and observe the change (20 accidents on Main street, 30 accidents on Front street during a given year). BUT in my data I have growth and death (and a combination of unobserved events due to not individually tagging the bacterial isolates) that I would like to quantify to help inform harvesting practices if bacterial contamination occurred in agricultural production systems.
          When exposed to unfavorable conditions bacteria will die or become inactive/dominant over time or have, in general, a negative relationship with time. In my study, while being exposed to the same environmental conditions, I observed vastly different responses in bacterial concentrations. I expected to see reduced numbers over the course of my trial. This was not the case within the daily replicates that were sampled.
          Thanks for sticking with me through this process!

          Comment


          • #6
            Well, a birth (growth)-death process is not a Poisson process even if its constituent birth and death processes separately are. So your instinct that using a Poisson (or negative binomial) model feels wrong is quite correct.

            Comment

            Working...
            X