Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Inestimable margins

    Hello,

    I am trying to run -margins- but the dy/dx of a key independent variable (month) is coming back "inestimable", while many other similar instances of -margins- which I run, using many of the same variables, come back fine. When I run the code

    Code:
    logistic lfs sex##survmnth##loneyg if loneyg==1 & edu==2, or
    margins i.survmnth#loneyg, dydx(sex)
    I obtain:

    Code:
    Conditional marginal effects                               Number of obs = 416
    Model VCE: OIM
    
    Expression: Pr(lfs), predict()
    dy/dx wrt:  2.sex
    
    ---------------------------------------------------------------------------------------------
                                |            Delta-method
                                |      dy/dx   std. err.      z    P>|z|     [95% conf. interval]
    ----------------------------+----------------------------------------------------------------
    1.sex                       |  (base outcome)
    ----------------------------+----------------------------------------------------------------
    2.sex                       |
                survmnth#loneyg |
    Feb#Lone parents, yg child  |          .  (not estimable)
    Mar#Lone parents, yg child  |  -.0047096   .0789112    -0.06   0.952    -.1593727    .1499536
    Apr#Lone parents, yg child  |  -.1016548   .1129416    -0.90   0.368    -.3230164    .1197067
    May#Lone parents, yg child  |   .0138889    .146612     0.09   0.925    -.2734654    .3012432
    ---------------------------------------------------------------------------------------------
    How can I make the dy/dx for February estimable in this case?

    I include some output from -dataex- also, for reference:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(sex survmnth) float(loneyg edu)
    1 5 . 2
    1 3 . 1
    1 3 . 1
    1 2 . 1
    1 3 . 0
    1 5 . 2
    1 2 . 1
    1 4 . 2
    1 2 . 2
    1 4 . 0
    1 5 . 0
    1 4 . 1
    1 2 . 1
    1 2 . 0
    1 2 . 0
    1 2 . 1
    1 2 . 0
    1 4 . 1
    1 2 . 1
    1 3 . 1
    1 5 . 0
    1 4 . 2
    1 5 . 2
    1 2 . 1
    1 2 . 1
    1 3 . 0
    1 3 . 0
    1 5 . 0
    1 4 . 1
    1 2 . 0
    1 2 0 1
    1 3 . 2
    1 5 . 0
    1 5 0 0
    1 4 . 1
    1 5 . 1
    1 3 . 1
    1 3 . 0
    1 2 . 2
    1 5 . 1
    1 2 . 0
    1 3 . 0
    1 4 . 1
    1 4 . 1
    1 5 . 1
    1 4 . 0
    1 2 . 2
    1 5 . 1
    1 5 . 2
    1 2 . 0
    1 4 . 1
    1 3 . 1
    1 2 . 1
    1 5 . 0
    1 3 . 1
    1 2 . 0
    1 3 . 0
    1 5 . 2
    1 3 . 1
    1 5 . 2
    1 2 . 1
    1 2 . 1
    1 4 . 2
    1 2 . 2
    1 2 . 1
    1 3 . 1
    1 5 . 1
    1 3 . 2
    1 2 . 1
    1 2 . 0
    1 4 . 1
    1 5 . 1
    1 4 . 0
    1 5 . 0
    1 4 . 1
    1 5 . 1
    1 4 . 1
    1 4 . 2
    1 4 . 1
    1 2 . 1
    1 2 . 0
    1 3 . 0
    1 5 . 1
    1 4 . 1
    1 3 . 0
    1 3 . 1
    1 5 . 1
    1 3 . 0
    1 4 . 0
    1 3 . 1
    1 5 . 0
    1 2 . 2
    1 3 . 2
    1 5 . 1
    1 2 . 1
    1 4 . 0
    1 4 . 1
    1 3 . 2
    1 2 . 0
    1 3 . 0
    1 2 . 0
    1 5 . 2
    1 4 . 1
    1 2 . 1
    1 4 . 2
    1 4 . 0
    1 5 . 0
    1 2 . 1
    1 3 . 1
    1 5 . 2
    1 2 . 0
    1 3 . 1
    1 3 . 0
    1 5 . 2
    1 4 . 1
    1 2 . 0
    1 4 . 2
    1 3 . 2
    1 5 . 1
    1 3 . 1
    1 3 . 0
    1 5 . 2
    1 4 . 1
    1 2 . 0
    1 4 . 0
    1 4 . 0
    1 2 . 0
    1 5 1 1
    1 2 . 2
    1 3 . 1
    1 5 . 1
    1 3 . 1
    1 5 . 1
    1 2 . 0
    1 5 . 2
    1 5 . 2
    1 2 . 1
    1 2 . 2
    1 4 . 1
    1 3 . 1
    1 3 . 1
    1 5 . 0
    1 3 . 1
    1 2 . 2
    1 3 . 2
    1 3 . 1
    1 4 . 1
    1 5 . 2
    1 2 . 1
    1 3 . 0
    end
    label values sex SEX
    label def SEX 1 "Male", modify
    label values survmnth survmnth
    label def survmnth 2 "Feb", modify
    label def survmnth 3 "Mar", modify
    label def survmnth 4 "Apr", modify
    label def survmnth 5 "May", modify
    label values loneyg loneyg
    label def loneyg 0 "Lone parents, old child", modify
    label def loneyg 1 "Lone parents, yg child", modify
    label values edu edu
    label def edu 0 "(<)HS", modify
    label def edu 1 "some uni/college deg/trades", modify
    label def edu 2 "BA degree+", modify

  • #2
    To understand the output of your margins command it would help to see the output of your logistic command.

    Although, without seeing it, I don't understand you can include loneyg as an independent variable in the model when you restrict your estimation sample to a single value of loneyg. I expect the logistic command will show that no coefficient was estimated on loneyg, and thus in your margins command survmnth#loneyg reduces to survmnth, and February is the omitted month in the logistic model results.

    Comment


    • #3
      The output of the -logistic- command
      Code:
      logistic lfs sex##survmnth##loneyg if loneyg==1 & edu==2
      is as follows:

      Code:
      note: 1.sex#2.survmnth != 0 predicts success perfectly;
            1.sex#2.survmnth omitted and 12 obs not used.
      
      note: 2.sex#5.survmnth omitted because of collinearity.
      note: 1.loneyg omitted because of collinearity.
      note: 2.sex#1.loneyg omitted because of collinearity.
      note: 3.survmnth#1.loneyg omitted because of collinearity.
      note: 4.survmnth#1.loneyg omitted because of collinearity.
      note: 5.survmnth#1.loneyg omitted because of collinearity.
      note: 2.sex#3.survmnth#1.loneyg omitted because of collinearity.
      note: 2.sex#4.survmnth#1.loneyg omitted because of collinearity.
      note: 2.sex#5.survmnth#1.loneyg omitted because of collinearity.
      
      Logistic regression                                     Number of obs =    416
                                                              LR chi2(6)    =  20.32
                                                              Prob > chi2   = 0.0024
      Log likelihood = -148.51315                             Pseudo R2     = 0.0640
      
      ----------------------------------------------------------------------------------------------------
                                     lfs | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
      -----------------------------------+----------------------------------------------------------------
                                     sex |
                                 Female  |   1.085714   .9257697     0.10   0.923     .2041319    5.774579
                                         |
                                survmnth |
                                    Mar  |   .6797516   .9578863    -0.27   0.784       .04294    10.76066
                                    Apr  |   .4531677   .6452557    -0.56   0.578     .0278132    7.383572
                                    May  |   .1982609   .1010105    -3.18   0.001     .0730406    .5381578
                                         |
                            sex#survmnth |
                               Male#Feb  |          1  (empty)
                             Female#Mar  |   .8634868   1.204715    -0.11   0.916     .0560636    13.29935
                             Female#Apr  |   .4259868   .5895851    -0.62   0.538      .028268    6.419447
                             Female#May  |          1  (omitted)
                                         |
                                  loneyg |
                 Lone parents, yg child  |          1  (omitted)
                                         |
                              sex#loneyg |
          Female#Lone parents, yg child  |          1  (omitted)
                                         |
                         survmnth#loneyg |
             Mar#Lone parents, yg child  |          1  (omitted)
             Apr#Lone parents, yg child  |          1  (omitted)
             May#Lone parents, yg child  |          1  (omitted)
                                         |
                     sex#survmnth#loneyg |
        Male#Feb#Lone parents, yg child  |          1  (empty)
      Female#Mar#Lone parents, yg child  |          1  (omitted)
      Female#Apr#Lone parents, yg child  |          1  (omitted)
      Female#May#Lone parents, yg child  |          1  (omitted)
                                         |
                                   _cons |   17.65351   16.77018     3.02   0.003     2.742969    113.6164
      ----------------------------------------------------------------------------------------------------
      It may be hard to see in the dataex above because of so few non-missing values, but loneyg does take the value of either 0/1 (parents of older children or younger children) - for 0 "older", n = 3,678 and for 1 "younger", n = 1,919.

      Also, some output from -dataex- that is more suitable:


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(sex survmnth) float(edu loneyg lfs)
      1 2 1 1 1
      1 2 0 0 1
      1 3 0 1 1
      1 3 2 0 1
      1 3 0 1 1
      1 4 2 0 0
      1 2 2 1 1
      1 3 0 0 0
      1 5 2 0 1
      1 4 0 1 0
      1 3 1 0 1
      1 2 1 0 1
      1 2 0 0 1
      1 2 0 1 0
      1 4 2 0 1
      1 2 1 1 1
      1 4 2 0 1
      1 5 0 1 1
      1 4 2 0 1
      1 2 0 1 1
      1 3 1 1 1
      1 5 2 0 1
      1 2 1 0 1
      1 5 1 1 0
      1 2 0 0 1
      1 3 1 1 0
      1 4 1 1 1
      1 3 2 0 1
      1 3 1 0 1
      1 4 2 0 1
      1 3 0 0 1
      1 2 1 1 1
      1 4 0 1 1
      1 5 1 1 1
      1 3 1 1 1
      1 2 0 0 1
      1 5 1 0 0
      1 3 1 0 0
      1 3 1 1 0
      1 3 0 0 1
      1 5 1 0 1
      1 4 1 0 1
      1 3 0 0 1
      1 2 1 0 1
      1 4 0 1 1
      1 2 1 0 1
      1 5 2 0 1
      1 2 1 0 1
      1 3 1 1 1
      1 2 1 0 1
      1 3 1 1 1
      1 4 0 0 1
      1 5 1 1 1
      1 2 1 0 1
      1 2 0 0 1
      1 3 0 0 1
      1 3 1 1 1
      1 4 2 0 1
      1 5 2 0 0
      1 3 1 0 1
      1 5 0 0 1
      1 3 0 0 1
      1 4 1 0 1
      1 3 2 0 1
      1 5 1 0 1
      1 3 1 0 1
      1 4 1 0 0
      1 3 0 0 1
      1 2 1 1 1
      1 2 0 0 1
      1 4 2 0 1
      1 4 1 1 1
      1 2 0 0 1
      1 3 1 1 1
      1 5 1 1 1
      1 5 1 1 0
      1 4 2 0 1
      1 2 1 1 1
      1 4 1 1 1
      1 2 1 0 1
      1 2 2 0 1
      1 4 1 0 1
      1 2 1 1 1
      1 3 1 0 1
      1 2 1 0 1
      1 5 0 1 1
      1 4 1 1 1
      1 5 1 0 1
      1 5 1 0 0
      1 4 0 1 1
      1 4 2 0 1
      1 5 0 1 0
      1 4 2 0 1
      1 4 1 0 1
      1 2 1 0 1
      1 3 1 0 1
      1 2 2 0 1
      1 5 0 0 0
      1 4 1 0 1
      1 5 1 0 1
      end
      label values sex SEX
      label def SEX 1 "Male", modify
      label values survmnth survmnth
      label def survmnth 2 "Feb", modify
      label def survmnth 3 "Mar", modify
      label def survmnth 4 "Apr", modify
      label def survmnth 5 "May", modify
      label values edu edu
      label def edu 0 "(<)HS", modify
      label def edu 1 "some uni/college deg/trades", modify
      label def edu 2 "BA degree+", modify
      label values loneyg loneyg
      label def loneyg 0 "Lone parents, old child", modify
      label def loneyg 1 "Lone parents, yg child", modify
      label values lfs lfs
      label def lfs 0 "not", modify
      label def lfs 1 "Employed", modify
      Last edited by Alex McIntosh; 03 Apr 2022, 15:19. Reason: Edited for -dataex-

      Comment


      • #4
        It may be hard to see in the dataex above because of so few non-missing values, but loneyg does take the value of either 0/1 (parents of older children or younger children) - for 0 "older", n = 3,678 and for 1 "younger", n = 1,919.
        No, it was apparent in the dataex output. But in your logisitic command and its output we see clearly
        Code:
        logistic lfs sex##survmnth##loneyg if loneyg==1 & edu==2
        ...
        note: 1.loneyg omitted because of collinearity.
        So among the 416 observations included in your logistic regression, all of them have loneyg==1 which means loneyg is collinear with the constant and thus loneyg is omitted from the model, and thus all the interactions within which it appears are also omitted.

        Comment


        • #5
          Originally posted by William Lisowski View Post

          in your logisitic command and its output we see clearly
          Code:
          logistic lfs sex##survmnth##loneyg if loneyg==1 & edu==2
          ...
          note: 1.loneyg omitted because of collinearity.
          So among the 416 observations included in your logistic regression, all of them have loneyg==1 which means loneyg is collinear with the constant and thus loneyg is omitted from the model, and thus all the interactions within which it appears are also omitted.
          So this means I should remove if loneyg? Trying this out:
          Code:
          logistic lfs sex##survmnth##loneyg if edu==2
          obtains:
          Code:
          note: 1.sex#2.survmnth != 0 predicts success perfectly;
                1.sex#2.survmnth omitted and 76 obs not used.
          
          note: 2.sex#5.survmnth omitted because of collinearity.
          note: 2.sex#5.survmnth#1.loneyg omitted because of collinearity.
          
          Logistic regression                                     Number of obs =  1,397
                                                                  LR chi2(13)   =  37.54
                                                                  Prob > chi2   = 0.0003
          Log likelihood = -418.21098                             Pseudo R2     = 0.0430
          
          ----------------------------------------------------------------------------------------------------
                                         lfs | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
          -----------------------------------+----------------------------------------------------------------
                                         sex |
                                     Female  |   .2295918   .2395319    -1.41   0.158       .02971    1.774232
                                             |
                                    survmnth |
                                        Mar  |   .1933405   .2335623    -1.36   0.174     .0181151    2.063504
                                        Apr  |   .1001228   .1164968    -1.98   0.048     .0102359    .9793544
                                        May  |   .5921053   .2465027    -1.26   0.208     .2618366    1.338959
                                             |
                                sex#survmnth |
                                   Male#Feb  |          1  (empty)
                                 Female#Mar  |   2.650585   3.149767     0.82   0.412     .2581275    27.21756
                                 Female#Apr  |   4.205364   4.794297     1.26   0.208     .4501912     39.2835
                                 Female#May  |          1  (omitted)
                                             |
                                      loneyg |
                     Lone parents, yg child  |   .2133211   .3088487    -1.07   0.286     .0124927    3.642612
                                             |
                                  sex#loneyg |
              Female#Lone parents, yg child  |   4.728889   6.371783     1.15   0.249       .33716    66.32575
                                             |
                             survmnth#loneyg |
                 Mar#Lone parents, yg child  |   3.515826   6.525732     0.68   0.498     .0924933    133.6424
                 Apr#Lone parents, yg child  |   4.526121   8.322717     0.82   0.412     .1231744    166.3151
                 May#Lone parents, yg child  |   .3348406   .2203067    -1.66   0.096     .0922135    1.215855
                                             |
                         sex#survmnth#loneyg |
           Male#Feb#Lone parents, old child  |          1  (empty)
            Male#Feb#Lone parents, yg child  |          1  (empty)
          Female#Mar#Lone parents, yg child  |   .3257722   .5970293    -0.61   0.541     .0089733    11.82704
          Female#Apr#Lone parents, yg child  |   .1012961    .181636    -1.28   0.202      .003015    3.403292
          Female#May#Lone parents, yg child  |          1  (omitted)
                                             |
                                       _cons |   82.75556   90.41691     4.04   0.000     9.722837    704.3707
          ----------------------------------------------------------------------------------------------------
          I tried to use if loneyg==1 to observe the employment gender gap among lone parents of children <6 (loneyg=1), with university education (edu=2). For this model, I am only trying to examine the gap among those with younger children. Not sure if I should use different syntax to go about selecting this specific subgroup for examination, but that is my goal for this model. I wonder how I might reduce collinearity while still examining this subgroup?

          But the same issue of an inestimable survmnth (February) persists with this line and its output, when if loneyg==1 is removed.

          For whatever reason, if my dependent variable is coded slightly more exclusively (to include only those who were employed and at work during COVID), a line of code which is otherwise the same produces output where February is estimable. Hence, the following code

          Code:
          logistic lfs1 sex##survmnth##loneyg if loneyg==1 & edu==2
          gives the result:
          Code:
          logistic lfs1 sex##survmnth##loneyg if loneyg==1 & edu==2
          note: 1.loneyg omitted because of collinearity.
          note: 2.sex#1.loneyg omitted because of collinearity.
          note: 3.survmnth#1.loneyg omitted because of collinearity.
          note: 4.survmnth#1.loneyg omitted because of collinearity.
          note: 5.survmnth#1.loneyg omitted because of collinearity.
          note: 2.sex#3.survmnth#1.loneyg omitted because of collinearity.
          note: 2.sex#4.survmnth#1.loneyg omitted because of collinearity.
          note: 2.sex#5.survmnth#1.loneyg omitted because of collinearity.
          
          Logistic regression                                     Number of obs =    428
                                                                  LR chi2(7)    =  16.38
                                                                  Prob > chi2   = 0.0219
          Log likelihood = -253.76826                             Pseudo R2     = 0.0313
          
          ----------------------------------------------------------------------------------------------------
                                        lfs1 | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
          -----------------------------------+----------------------------------------------------------------
                                         sex |
                                     Female  |   .3490911   .3729439    -0.99   0.325     .0430107    2.833358
                                             |
                                    survmnth |
                                        Mar  |   .2045456    .246477    -1.32   0.188     .0192794    2.170132
                                        Apr  |   .7272732   1.082614    -0.21   0.831     .0393192     13.4521
                                        May  |   .1818183   .2293296    -1.35   0.177     .0153464    2.154113
                                             |
                                sex#survmnth |
                                 Female#Mar  |   2.507714   3.120189     0.74   0.460      .218868    28.73253
                                 Female#Apr  |   .6937658   1.055278    -0.24   0.810     .0351934    13.67616
                                 Female#May  |   2.005207   2.613292     0.53   0.593     .1558936     25.7923
                                             |
                                      loneyg |
                     Lone parents, yg child  |          1  (omitted)
                                             |
                                  sex#loneyg |
              Female#Lone parents, yg child  |          1  (omitted)
                                             |
                             survmnth#loneyg |
                 Mar#Lone parents, yg child  |          1  (omitted)
                 Apr#Lone parents, yg child  |          1  (omitted)
                 May#Lone parents, yg child  |          1  (omitted)
                                             |
                         sex#survmnth#loneyg |
          Female#Mar#Lone parents, yg child  |          1  (omitted)
          Female#Apr#Lone parents, yg child  |          1  (omitted)
          Female#May#Lone parents, yg child  |          1  (omitted)
                                             |
                                       _cons |   10.99999   11.48912     2.30   0.022     1.420174    85.20071
          ----------------------------------------------------------------------------------------------------
          So I guess at this point I have a twofold issue of 1) the original question: how to make February estimable?; and 2) how to deal with the collinearity while selecting a certain subgroup for comparison?

          Comment


          • #6
            So this means I should remove if loneyg?
            ...
            I tried to use if loneyg==1 to observe the employment gender gap among lone parents of children <6 (loneyg=1), with university education (edu=2). For this model, I am only trying to examine the gap among those with younger children. Not sure if I should use different syntax to go about selecting this specific subgroup for examination, but that is my goal for this model.
            To me that suggests your model should perhaps be
            Code:
            logistic lfs sex##survmnt if loneyg==1 & edu==2, or
            margins i.survmnth, dydx(sex)

            For whatever reason, if my dependent variable is coded slightly more exclusively (to include only those who were employed and at work during COVID), a line of code which is otherwise the same produces output where February is estimable.
            With the original dependent variable you see
            Code:
            note: 1.sex#2.survmnth != 0 predicts success perfectly;
                  1.sex#2.survmnth omitted and 12 obs not used.
            ...
            Logistic regression                                     Number of obs =    416
            while with the revised dependent variable you see
            Code:
            Logistic regression                                     Number of obs =    428
            because it is no longer the case that all 12 observations with sex==1 and survmnth==2 have ifs==1.

            When I read "for whatever reason" in your explanation, it suggests to me that you are grasping at straws, trying whatever you think of to get some results, regardless of the interpretation of what you are trying.

            Your understanding of logistic regression and the interpretation of its output and the margins that result would benefit from the time spent reviewing the first three lectures in the Categorical Data Analysis course notes prepared by Richard Williams, a frequent contributor here, at https://www3.nd.edu/~rwilliam/xsoc73994/index.html.

            Comment


            • #7
              When I read "for whatever reason" in your explanation, it suggests to me that you are grasping at straws, trying whatever you think of to get some results, regardless of the interpretation of what you are trying.
              Apologies, I must admit, I am not proficient at Stata. Still, I can't imagine why I would be on statalist.org if I were paying no regard to the interpretation of my results. In any case, I appreciate the recommendation of the resources from the Categorical Data Analysis course, as I am always glad to learn more.

              That said, when I make use of the suggested code
              Code:
              logistic lfs sex##survmnt if loneyg==1 & edu==2, or
              the output is
              Code:
              note: 1.sex#2.survmnth != 0 predicts success perfectly;
                    1.sex#2.survmnth omitted and 12 obs not used.
              
              note: 2.sex#5.survmnth omitted because of collinearity.
              
              Logistic regression                                     Number of obs =    416
                                                                      LR chi2(6)    =  20.32
                                                                      Prob > chi2   = 0.0024
              Log likelihood = -148.51315                             Pseudo R2     = 0.0640
              
              ------------------------------------------------------------------------------
                       lfs | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                       sex |
                   Female  |   1.085714   .9257697     0.10   0.923     .2041319    5.774579
                           |
                  survmnth |
                      Mar  |   .6797516   .9578863    -0.27   0.784       .04294    10.76066
                      Apr  |   .4531677   .6452557    -0.56   0.578     .0278132    7.383572
                      May  |   .1982609   .1010105    -3.18   0.001     .0730406    .5381578
                           |
              sex#survmnth |
                 Male#Feb  |          1  (empty)
               Female#Mar  |   .8634868   1.204715    -0.11   0.916     .0560636    13.29935
               Female#Apr  |   .4259868   .5895851    -0.62   0.538      .028268    6.419447
               Female#May  |          1  (omitted)
                           |
                     _cons |   17.65351   16.77018     3.02   0.003     2.742969    113.6164
              ------------------------------------------------------------------------------
              So the original issue persists, and February remains inestimable. According to
              Code:
              1.sex#2.survmnth omitted and 12 obs not used.
              it seems those 12 observations with lfs==0 are not used here (or at least those observations where sex==1 and survmnth==2). I do not know why using the revised dependent variable would include these 12, but using the original DV excludes them (also unsure what is causing the collinearity in sex==2 & survmnth==5).

              Comment


              • #8
                In the logistic regression output in #7, all of the observations for male sex and February, (1.sex#2.survmnth) have been omitted due to perfect prediction: that is, because lfs is always 1 for that combination of month and sex (at least in the subpopulation you are trying to estimate in, namely with edu == 2 and loneyg == 1). So you have no such observations in the estimation sample. Because of that, effects involving February become inestimable due to lack of information about February with respect to males. That this doesn't happen with the revised variable tells me that with the revised version of lfs there are at least some observations with male sex in February for which lfs = 0.

                The fact that there are no observations with male sex in February in the estimation sample is also the reason why you cannot get a marginal effect estimate for February: the marginal effect in any month has to be averaged over males and females, but you have no males available to do that, so Stata honestly confesses that what you have asked it to do is not possible.

                One option is to simply forgo trying to get the February estimate. Is it important? If you really need it, then you have to abandon using logistic regression. Two alternatives that might work for you are to use a linear probability model, or to use -firthlogit- (by Joseph Coveney, available from SSC). A linear probability model might be dicey for this data: if a large subset of the data offer near-perfect prediction, then you may have a substantial part of the data set where predicted probabilities are very close to 1. Linear probability models don't work that well near 1 or 0. So -firthlogit- might be a better bet. It fits a logistic regression model, but instead of estimating by maximum likelihood, it estimates with penalized maximum likelihood. And that enables it to tolerate perfect prediction without having to expel any observations from the estimation sample.

                Added: I think the reason 2.sex#5.survmnth becomes colinear is that, but for its elimination due to perfect prediction, 1.sex#2.survmnth would ordinarily be the reference category for the sex#survmnth interaction terms. Since there are no such observations retained in the estimation sample, you are left with a complete set of indicators for all the interaction combinations, with no omitted reference category--so they are all colinear with the constant term. So one of them has to be eliminated to break that colinearity; and 2.sex#5.survmnth being the "last" one gets picked for that distinction. You will see this whenever the anticipated reference category for some group of indicators gets omitted from the estimation sample--some other catgegory must also be omitted to break the colinearity with the constant term that results.
                Last edited by Clyde Schechter; 03 Apr 2022, 19:53.

                Comment

                Working...
                X