Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logit Regression

    Hi,

    I have a sample of 1400 companies and I have a binary dependant variable- 1 or 0. So for that sample, I have only 11 '1' i.e. the event happens and rest are 'event doesn't happen. Will that give me any useful output?
    What should be an ideal distribution?

    Thanks!

    Regards,
    Anuja

  • #2
    The problem with your data is that your dependent variable has very low variance, and thus there is very little variance that can be explained by the explanatory variables. With only 11 events you may get something with only one explanatory variable, but would be skeptical about the results (if you get any), because there is just too little information present in your data. More than 1 explanatory variable does not make sense to me in that situation.

    The ideal distribution would be 700 events and 700 non-events, i.e. 50% events and 50% non-events. That way you maximize the variance in the explanatory variable.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I don't have used it so far, but I suggest you take a look at the penalized maximum likelihood estimation (i.e., the Firth method), available in Stata by installing the user-written program - firfhlogit -, whose author is Joseph Coveney (by the way, a very active member of this Forum).

      You may as well wish to read this excellent text on rare events under logistic regression, written by Richard Williams, also a very active member of this Forum.

      Hopefully that helps!
      Best regards,

      Marcos

      Comment


      • #4
        Anuja:
        two asides to previous helpful comments:
        - first, I would check for any data entry error in your dependent variable;
        - provided that no error is detected, with such a wide difference between 1s and 0s in the regressand, I would focus on descriptive statistics only.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Thanks a lot! This helped a lot! I will try firthlogit, hopefully will work. Will keep you posted.

          Comment


          • #6
            There is another user-written command that I believe should also be considered in any context where you're contemplating using firthlogit:
            Code:
            search penlogit
            will show from where you can install it.

            And not least:
            Code:
            bayes, <considered priors>: logit

            Comment


            • #7
              Hi I tried penlogit but the predicted values (option pr,; so the option to get probability for a positive outcome) are negative. What would that mean?

              Comment


              • #8
                Originally posted by Anuja Tandon View Post
                I tried penlogit but the predicted values (option pr,; so the option to get probability for a positive outcome) are negative. What would that mean?
                My guess is that you got linear predictions (log odds).

                Comment


                • #9
                  Hello, author of -penlogit- here.

                  Joseph is right. Even by specifying the option pr, you'll get linear predictions (log odds). You can take the invlogit of the linear predictions to obtain probabilities.
                  I'll look into why the option pr doesn't work (even if -penlogit- has e(predict)=logit_p in ereturn list).

                  Comment


                  • #10
                    Anuja Tandon

                    Thanks to Anuja for spotting an issue with -predict-'s behavior after -penlogit- (the fault is of course mine and does not depend on -predict-).

                    This has been addressed in version 1.1.0 of -penlogit- available on GitHub (https://github.com/anddis/penlogit). See below for more info.

                    Since -penlogit- calls -glm- under the hood, it's probably more logical for -penlogit- to use -glim_p- as opposed to -logit_p-. See help glm postestimation##predict

                    Code:
                    // Update penlogit to version 1.1.0 from GitHub
                    net install penlogit, from("https://raw.githubusercontent.com/anddis/penlogit/master/") replace
                    
                    // Load the full dataset on neonatal mortality (Neutra et al. 1978)
                    use http://www.imm.ki.se/biostatistics/data/neutra1978.dta, clear
                    
                    // Estimate penalized maximum-likelihood odds-ratio for "no monitoring" status and age at delivery
                    xi: penlogit death nomonit i.teenages, lfprior(nomonit log(2) 2000 2 0.5) nprior(_Iteenages_1 log(2) 0.5 _Iteenages_2 log(4) 0.5)
                    
                    // Calculates the linear predictor (log odds)
                    predict logodds, xb
                    
                    // Calculates the probability of a positive outcome
                    predict prob, mu
                    
                    // Display results
                    table nomonit teenages, c(mean logodds mean prob)
                    Last edited by Andrea Discacciati; 18 Jul 2017, 09:07.

                    Comment


                    • #11
                      Originally posted by Andrea Discacciati View Post
                      This has been addressed in version 1.1.0 of -penlogit- available on GitHub (https://github.com/anddis/penlogit).
                      Code:
                      search penlogit
                      doesn't show that location from which to install the command. You might want to notify StataCorp that you've got the most recent version at that URL so that they can update the search locations to include it.

                      Comment


                      • #12
                        Joseph Coveney I'll run some more checks to make sure that everything's ok and then I'll submit a "Software update" to the Stata Journal.
                        At any rate, thank you for pointing out the possibility to make one's own website available through Stata (http://www.stata.com/support/faqs/re...ing-a-command/ – see 2). I honestly didn't know it was an option.

                        Comment


                        • #13
                          Thanks a lot Andrea and Joseph!

                          Comment


                          • #14
                            Hi! In my dataset I have only 3% 1. So 47 out of 1500. I tried penlogit, logit and firthlogit and all of them give similar result. Does this mean that my data is robust or is it garbage in-garbage out? Thanks!

                            Comment

                            Working...
                            X