Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Question about Censored data, not Stata specific

    Hopefully this kind of question is appropriate. I've done an analysis in the the outcome is a measure of affiliation with 12-step programs. These are not rate data based only on frequency of attendance. It's a published measure. The lower limit is 0 if the participant has never attended a 12-step meeting. All additional data are contingent on attendance. About 25% of the observations are at the lower limit of 0. Most definitions of censoring I've seen would say that a variable is censored if values less than or greater than some threshold of measurement are coded at the limit. In my case, it would imply that values < 0 are possible, but we couldn't observe them. Here, 0 is the lower limit and values less than 0 are not possible. However, I've also seen references in which censoring was used describe data at actual boundaries. E.g., students can't score < 0% or > 100% on an exam. I used censored regression to analyze these data and have a reviewer who insists the data are not censored. My question: Are my data censored at 0? I considered using a two-step selection model.

  • #2
    I would not label a variable where natural bounds are attained in the data as censored. Is a (0, 1) indicator censored? Is percent of males censored if data points include 0 or 100%? Is heads as fraction of coin flips (tosses) censored because the first value must be 0 or 1? You have an ordered scale and I have never heard anyone argue that such scales are censored.

    Cue to expose my ignorance or misunderstanding.

    Comment


    • #3
      It's no doubt my ignorance that has been exposed. It's a pretty strange measure and I'm doing the data analysis for colleagues. The first question is how many times in your life have you attended a 12-step program with no upper bound. The developers of the scale give a scoring algorithm in which scores of 0, .25, .5, .75, and 1 are assigned depending on how many meetings were attended. I don't remember the specific number of meetings used to define these values other than 0 = never attended a meeting. Then they are asked # of meetings in the past 12-months, with the same scoring. And then several dichotomous items. The possible range is 0 to 9 with scores like .25, .5, 3.75, etc. I'm not really worried about censoring at the high end. The highest observed score was 8 and there are relatively few observations with scores above 6. The original authors used reported means, sds, and treated the outcome as an unbounded continuous variable using OLS regression. Just under 25% of the observations are at 0. Any categorization would have no published precedent and would be arbitrary. The data are not truncated, the outcome is not a count variable, the precedent has been to analyze it as a continuous variable. The persons who wrote the manuscript (I'm only analyzing their data) want to measure the amount of 12-step affiliation. Not just something like how many meetings they have attended. So I used censored regression with robust standard errors as the most appropriate strategy. The reviewers comment was basically a terse, the data aren't censored.But I certainly see examples of censored regression being used with outcomes that are strictly bounded. One of the examples in the Stata manual uses hours worked as an outcome in which 0 the lower limit. I see that as analogous to the example on which I'm working. So I'm kind of at a loss as to how to proceed or respond to the reviewer.

      Comment


      • #4
        I would take that manual example up with StataCorp. They don't mind (and FWIW I don't mind) facetious examples (and a few of mine have crept anonymously into the manuals) but no example should be misleading on key technical points.

        To be frank, to assign scores 0(0.25)1 to a counted response sounds utterly perverse to me. A lot of experience and various formal arguments lies behind a suggestion that Poisson regression should work well (and contrary to myth) it usually works well over a range of marginal and conditional distributions for any count-like response.

        Comment


        • #5
          I would only add: improper application of a method in a published study is not an argument for repeating the mistake.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment

          Working...
          X