Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to adjust for large number of Observations being "ZERO" in my PROBIT model

    Dear Friends;
    I am running a Probit model where the dataset has a large number of zero values (98%) with "0" representing "Not faced Violence" and "1" representing "Faced Violence". I tried using "zioprobit" but the results are showing "Cut 1" whereas I am not sure why a Probit model has to have "Cut 1"?
    1) Are we doing the correct modelling as we want to adjust for a large number of zeroes in the Probit model?
    2) Does STATA using "zioprobit" command automatically distinguish whether the dependent variable is binary or has multiple categories/values?
    3) Also Why "z" statistic is showing a -ve sign while the coefficient if positive?
    Regards
    Last edited by Shashank Shukla; 23 Oct 2019, 01:36.

  • #2
    Dear Shashank Shukla,

    The question is why do you have so many zeros. It may well be the case that a regular binary choice model such as the probit is perfectly adequate to describe your data.

    Best wishes,

    Joao

    Comment


    • #3
      This is a conflict-related question where most of the respondents have not faced violence. We are experimenting with Zero-inflated Ordered Probit i.e. "zioprobit" but not entirely sure, hence the question.

      Comment


      • #4
        Since the DV is binary and not ordinal, I am not sure zioprobit is appropriate. You may want to use programs for rare events, like firthlogit. Some options are discussed at

        https://www3.nd.edu/~rwilliam/stats3/RareEvents.pdf

        Other points: ordered probit and logit models have cutpoints rather than intercepts. For a binary DV, the cutpoint will be the intercept * -1.

        Showing your code and output might make it easier to answer your Qs. See pt 12 of the Statalist FAQ on asking Qs effectively.
        -------------------------------------------
        Richard Williams, Notre Dame Dept of Sociology
        StataNow Version: 18.5 MP (2 processor)

        EMAIL: [email protected]
        WWW: https://www3.nd.edu/~rwilliam

        Comment


        • #5
          I would never try something like zero-inflated ordered probit before I was confident that probit was not a good idea.

          Comment


          • #6
            Dear Shashank Shukla,

            Thanks for clarifying. Zero inflation would only be appropriate if it is impossible for some respondents to face violence which probably is not the case, so standard logit/probit sounds like a good starting point. How many observations do you have?

            Best wishes,

            Joao

            Comment


            • #7
              My recollection is that the literature on sparse outcomes in binary regression models (at least logit) has shown that problems arise from having a small *number* of outcomes in the rare category, not a small fraction. So, the 98/2% split would not be a matter of concern if the total sample size was large, say 2,000 or so.

              Comment


              • #8
                Thank you, everyone, for your response. I will read the reference documents suggested by Richard.
                Meanwhile to answer some of the questions asked above:
                • The number of Observations is more than 200,000 so that is not a problem. Of these nearly 4000 have faced violence.
                • I was thinking of zero-inflated as it gives me an opportunity to differentiate between people who will never face violence versus people who are currently not facing violence but that may change with change in their socio-economic status. Does simple Probit allow me to do such an analysis

                Comment


                • #9
                  The code and the results in a pdf.
                  Attached Files

                  Comment


                  • #10
                    Results & Code.pdf

                    Comment


                    • #11
                      Dear Shashank Shukla,

                      Thanks for the additional information. As suggested above, with that sample size, I would not worry about rare events. On the use of zero inflated models, can we really say that someone will never face violence? I would stick to standard binary choice models.

                      Best wishes,

                      Joao

                      Comment


                      • #12
                        Many thanks, everyone for your advice. On balance I will stick to standard probit for now.

                        Comment

                        Working...
                        X