Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • zero-inflated and right-censored count data

    Dear all statalists,
    Thank you for clicking on my post.

    What is the proper way to deal with zero-inflated and right-censored count data?

    I have written a histogram of count data that takes integer values from 0 to 30, and found excess zero and bunching at 30.
    I have used the zinb command to deal with zero inflation, but there does not seem to be a ul(30) option as implemented in cpoisson.
    Please let me know how to deal with this in a practical way.

    Thank you for taking the time to read my post.

  • #2
    It would be helpful if you could show us the histogram.

    Best wishes,

    Joao

    Comment


    • #3
      Originally posted by Joao Santos Silva View Post
      It would be helpful if you could show us the histogram.

      Best wishes,

      Joao
      Dear Mr. Silva,

      Thank you for replying my post.
      I have attached a histogram of the three groups.

      Click image for larger version

Name:	histo_2.jpg
Views:	1
Size:	49.4 KB
ID:	1633787

      sincerely,

      Comment


      • #4
        Thanks for providing the image.

        Is 30 the largest value the variable can take, or is it just the maximum in the sample? If it is just the sample maximum, I would start with a simple Poisson regression (the histogram does not show any evidence of zero inflation and cases where that approach is suitable are rare). If it is the maximum the variable can take, I suggest you use a fractional logit unless you need to compute the probabilities of some events.

        Best wishes,

        Joao

        Comment


        • #5
          Originally posted by Joao Santos Silva View Post
          Thanks for providing the image.

          Is 30 the largest value the variable can take, or is it just the maximum in the sample? If it is just the sample maximum, I would start with a simple Poisson regression (the histogram does not show any evidence of zero inflation and cases where that approach is suitable are rare). If it is the maximum the variable can take, I suggest you use a fractional logit unless you need to compute the probabilities of some events.

          Best wishes,

          Joao
          Dear Mr. Silva,

          Thank you for your kindness and supportive advice.
          When I divide the count data by 30, which is the largest value the variable can take, and use fractional logistic regression, it seems that the goodness of fit of the model is indeed good.

          If it's not too much trouble, please let me know if you have any additional information.

          If I still want to use the zero-inflated model, is there any option to deal with right-censored like cpoisson?

          Sincerely,

          Comment


          • #6
            Please check the help file of the flogit command for references.

            I do not think there is a way to impose an upper bound when using zip, but note that your data appears to be bounded, not censored.

            Best wishes,

            Joao

            Comment


            • #7
              You could try a zero and one inflated beta model (zoib available on SSC). However, even though I wrote it, I am actually gravitating away from that. I find the model often too fragile, whereas a fractional logit is often more robust. So I would only do that if I was really interested in those 0s and 1s (well 30s in your case), and even than I would be careful.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Originally posted by Joao Santos Silva View Post
                Please check the help file of the flogit command for references.

                I do not think there is a way to impose an upper bound when using zip, but note that your data appears to be bounded, not censored.

                Best wishes,

                Joao
                Dear Mr. Silva,

                Thank you for adding to my understanding.

                Sincerely,
                Makoto

                Comment


                • #9
                  Originally posted by Maarten Buis View Post
                  You could try a zero and one inflated beta model (zoib available on SSC). However, even though I wrote it, I am actually gravitating away from that. I find the model often too fragile, whereas a fractional logit is often more robust. So I would only do that if I was really interested in those 0s and 1s (well 30s in your case), and even than I would be careful.
                  Dear Mr. Buis,

                  Thank you for your kindness and new point of view.
                  I'll consider it again.

                  Comment


                  • #10
                    Originally posted by Maarten Buis View Post
                    You could try a zero and one inflated beta model (zoib available on SSC). However, even though I wrote it, I am actually gravitating away from that. I find the model often too fragile, whereas a fractional logit is often more robust. So I would only do that if I was really interested in those 0s and 1s (well 30s in your case), and even than I would be careful.
                    Maarten,

                    Just curious, Stata has had a beta regression (betareg) since version 14 (I believe). That native command can only handle values of the DV between 0 and 1 exclusive, i.e. an observation can't have proportions of 0 or 1. Is your version different?
                    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

                    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

                    Comment


                    • #11
                      Yes, the zeros and ones have their own part in the model.
                      ---------------------------------
                      Maarten L. Buis
                      University of Konstanz
                      Department of history and sociology
                      box 40
                      78457 Konstanz
                      Germany
                      http://www.maartenbuis.nl
                      ---------------------------------

                      Comment

                      Working...
                      X