Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • conditional logistic regression multicollinearity

    I am running a logistic regression on whether gender influences whether an undergraduate invests or not.

    Gender takes the form of dummy variable: 0 = female; male = 1
    Invest => yes = 1/ no = 0
    I have included other explanatory variables: year group; finance module taken; income; whether someone claimed freeschool meals (freeschool)

    I then want to run a conditional logistic regression, for male and females:

    here I input the command to find female investors:

    logit invest1 i.gen1 i.finance1 i.year1 i.income1 i.freeschool1 if (gen1==0), robust

    But this flags up the message:

    "note: 5.year1 != 0 predicts success perfectly;
    5.year1 omitted and 2 obs not used.

    note: 5.income1 != 0 predicts failure perfectly;
    5.income1 omitted and 1 obs not used.

    note: 0.gen1 omitted because of collinearity."



    Running the next conditional logistic regression for male investors:

    logit invest1 i.gen1 i.finance1 i.year1 i.income1 i.freeschool1 if (gen1==1), robust


    and this comes up with the message:

    note: 2.year1 != 0 predicts success perfectly;
    2.year1 omitted and 2 obs not used.

    I read in other forums that the "firthlogit" can overcome issues of perfect predictions, however when I input this, stata states this command is unrecognized?

    1. How do I overcome the collinearity issues in the 1st conditional regression?
    2. How do I overcome the issue of the perfect prediction?


    Note: my sample size is very small of 105 obs and so splitting the conditional regression makes the sample even smaller.



  • #2
    The effect of a variable is a comparison. In your case you are comparing males and females. To do so you need both males and females. In your regressions you do your analysis separately by males and females, so no comparison is possible: in the "female model" there are no males to compare with and in the "male model" there are no females to compare with. So to get at the effect of gender, all genders need to be in your model. This can also solve your perfect prediction problem.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Hi,

      thank you for your response!

      Sorry I feel I was unclear. I was recommended to run a conditional regression separating the two genders to see whether being a female has effect on investing or not.

      this was recommended as my initial logistic regression had no statistical significance.
      So invest (yes/no) = female + finance module+ income + year group and then see if there are statistical significance between any of these variables. So I am not explicitly comparing the two in the model. However when I split up these groups it seems to work for males but comes up with Collinearity errors with females. Also as the groups are so small when using conditional logistic regression, this leads to a perfect prediction problem in the year variable.

      Would you recommend omitting the “year group” variable as this is not a vital variable to include.

      how can I overcome the collinearity?

      Apologies if I haven’t explained very well !

      Comment


      • #4
        It is logically impossible to compare genders if you model does not contain all genders. There is nothing else to say. If you want to include the variable gender, then you cannot select on that same variable.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment

        Working...
        X