Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression and interactions with rare events

    Dear statalist users

    My aim:
    I´m running a logistic regression with an interaction term (EducationLevel##EthnicGroup).
    Dependent variable is participation in training. I want to compare marginal effects of participation between the two ethnic groups by educational level (low, middle, high).

    Code:
    logistic i.EducationLevel##i.EthnicGroup
    .margins, dydx(EthnicGroup)at(EducationLevel=(1(1)3))
    .

    My Sample:
    My problem is that one of the ethnic groups (EthnicGroup1) is rather large (n=6212) so that all educational levels show an even amount of events (see table group1) whereas the second group (EthnicGroup2) is small (n=315) and the lower educational level shows only 13 events (see table group2).


    Code:
    . tab Education Participation
    
                                Participation in training
    Education Level
    (group1)                    0              1        Total
    ------------------+----------------------+----------
              low                836           304        1140
              middle          2041         1278        3319
              high              895            865        1760
    ------------------+----------------------+----------
               Total           3772        2447         6212
    
    
    
    . tab Education Participation
    
                                  Participation in training
    Education Level  
    (group2)                   0              1         Total
    ------------------+----------------------+----------
              low                91             13        104
              middle          112             25        137
              high              47              27         74
    ------------------+----------------------+----------
               Total            250            65        315



    My Question:
    Should I consider using estimation for rare data (i.e. Firth Method)?

    Thank you for your help
    Sara

  • #2
    It doesn't seem to make much difference. You have more covariates?

    .ÿ
    .ÿversionÿ15.1

    .ÿ
    .ÿclearÿ*

    .ÿ
    .ÿinputÿbyteÿparticipateÿstr6ÿeducationÿbyteÿraceÿintÿcount

    ÿÿÿÿÿpartic~eÿÿeducationÿÿÿÿÿÿraceÿÿÿÿÿcount
    ÿÿ1.ÿ0ÿlowÿÿÿÿÿ1ÿÿ836
    ÿÿ2.ÿ1ÿlowÿÿÿÿÿ1ÿÿ304
    ÿÿ3.ÿ0ÿmiddleÿÿ1ÿ2041
    ÿÿ4.ÿ1ÿmiddleÿÿ1ÿ1278
    ÿÿ5.ÿ0ÿhighÿÿÿÿ1ÿÿ895
    ÿÿ6.ÿ1ÿhighÿÿÿÿ1ÿÿ865
    ÿÿ7.ÿ0ÿlowÿÿÿÿÿ2ÿÿÿ91
    ÿÿ8.ÿ1ÿlowÿÿÿÿÿ2ÿÿÿ13
    ÿÿ9.ÿ0ÿmiddleÿÿ2ÿÿ112
    ÿ10.ÿ1ÿmiddleÿÿ2ÿÿÿ25
    ÿ11.ÿ0ÿhighÿÿÿÿ2ÿÿÿ47
    ÿ12.ÿ1ÿhighÿÿÿÿ2ÿÿÿ27
    ÿ13.ÿend

    .ÿ
    .ÿassertÿ!missing(participate,ÿeducation,ÿrace,ÿcount)

    .ÿ
    .ÿlabelÿdefineÿEducationLevelÿ1ÿlowÿ2ÿmiddleÿ3ÿhigh

    .ÿencodeÿeducation,ÿgenerate(level)ÿlabel(EducationLevel)ÿnoextend

    .ÿ
    .ÿtableÿparticipateÿlevelÿraceÿ[fweight=count],ÿcontents(nÿcount)

    ------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿraceÿandÿlevelÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ
    participaÿ|ÿ----------ÿ1ÿ---------ÿÿÿÿ----------ÿ2ÿ---------
    teÿÿÿÿÿÿÿÿ|ÿÿÿÿlowÿÿmiddleÿÿÿÿhighÿÿÿÿÿÿÿlowÿÿmiddleÿÿÿÿhigh
    ----------+-------------------------------------------------
    ÿÿÿÿÿÿÿÿ0ÿ|ÿÿÿÿ836ÿÿÿ2,041ÿÿÿÿÿ895ÿÿÿÿÿÿÿÿ91ÿÿÿÿÿ112ÿÿÿÿÿÿ47
    ÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿÿ304ÿÿÿ1,278ÿÿÿÿÿ865ÿÿÿÿÿÿÿÿ13ÿÿÿÿÿÿ25ÿÿÿÿÿÿ27
    ------------------------------------------------------------

    .ÿ
    .ÿlogitÿparticipateÿi.level##i.raceÿ[fweight=count],ÿnolog

    LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ6,534
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(5)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ214.54
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000
    Logÿlikelihoodÿ=ÿ-4245.6844ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0246

    ------------------------------------------------------------------------------
    ÿparticipateÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿlevelÿ|
    ÿÿÿÿÿmiddleÿÿ|ÿÿÿ.5434574ÿÿÿÿ.075882ÿÿÿÿÿ7.16ÿÿÿ0.000ÿÿÿÿÿ.3947314ÿÿÿÿ.6921833
    ÿÿÿÿÿÿÿhighÿÿ|ÿÿÿ.9775067ÿÿÿ.0822133ÿÿÿÿ11.89ÿÿÿ0.000ÿÿÿÿÿ.8163715ÿÿÿÿ1.138642
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿ2.raceÿ|ÿÿ-.9343092ÿÿÿÿÿ.30397ÿÿÿÿ-3.07ÿÿÿ0.002ÿÿÿÿ-1.530079ÿÿÿ-.3385391
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿlevel#raceÿ|
    ÿÿÿmiddle#2ÿÿ|ÿÿ-.0971703ÿÿÿ.3776225ÿÿÿÿ-0.26ÿÿÿ0.797ÿÿÿÿ-.8372967ÿÿÿÿ.6429562
    ÿÿÿÿÿhigh#2ÿÿ|ÿÿÿ.4140927ÿÿÿ.3911327ÿÿÿÿÿ1.06ÿÿÿ0.290ÿÿÿÿ-.3525132ÿÿÿÿ1.180699
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-1.011601ÿÿÿÿ.066975ÿÿÿ-15.10ÿÿÿ0.000ÿÿÿÿ-1.142869ÿÿÿ-.8803324
    ------------------------------------------------------------------------------

    .ÿmargins,ÿdydx(race)ÿat(ÿlevelÿ=ÿ(1(1)3)ÿ)ÿnoatlegend

    ConditionalÿmarginalÿeffectsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ6,534
    ModelÿVCEÿÿÿÿ:ÿOIM

    Expressionÿÿÿ:ÿPr(participate),ÿpredict()
    dy/dxÿw.r.t.ÿ:ÿ2.race

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿdy/dxÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    1.raceÿÿÿÿÿÿÿ|ÿÿ(baseÿoutcome)
    -------------+----------------------------------------------------------------
    2.raceÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿ_atÿ|
    ÿÿÿÿÿÿÿÿÿÿ1ÿÿ|ÿÿ-.1416667ÿÿÿ.0349746ÿÿÿÿ-4.05ÿÿÿ0.000ÿÿÿÿ-.2102156ÿÿÿ-.0731177
    ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿÿ-.202574ÿÿÿ.0340626ÿÿÿÿ-5.95ÿÿÿ0.000ÿÿÿÿ-.2693356ÿÿÿ-.1358124
    ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.1266124ÿÿÿ.0572154ÿÿÿÿ-2.21ÿÿÿ0.027ÿÿÿÿ-.2387526ÿÿÿ-.0144722
    ------------------------------------------------------------------------------
    Note:ÿdy/dxÿforÿfactorÿlevelsÿisÿtheÿdiscreteÿchangeÿfromÿtheÿbaseÿlevel.

    .ÿ
    .ÿfirthlogitÿparticipateÿi.level##i.raceÿ[fweight=count],ÿnolog

    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ6,534
    ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿWaldÿchi2(5)ÿÿÿÿÿÿ=ÿÿÿÿÿ194.13
    Penalizedÿlogÿlikelihoodÿ=ÿ-4248.6141ÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0000

    ------------------------------------------------------------------------------
    ÿparticipateÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    ÿÿÿÿÿÿÿlevelÿ|
    ÿÿÿÿÿmiddleÿÿ|ÿÿÿ.5425582ÿÿÿ.0758388ÿÿÿÿÿ7.15ÿÿÿ0.000ÿÿÿÿÿ.3939168ÿÿÿÿ.6911995
    ÿÿÿÿÿÿÿhighÿÿ|ÿÿÿ.9764807ÿÿÿ.0821682ÿÿÿÿ11.88ÿÿÿ0.000ÿÿÿÿÿ.8154339ÿÿÿÿ1.137527
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿ2.raceÿ|ÿÿ-.9030926ÿÿÿ.2991349ÿÿÿÿ-3.02ÿÿÿ0.003ÿÿÿÿ-1.489386ÿÿÿÿ-.316799
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿlevel#raceÿ|
    ÿÿÿmiddle#2ÿÿ|ÿÿ-.1131856ÿÿÿÿ.372637ÿÿÿÿ-0.30ÿÿÿ0.761ÿÿÿÿ-.8435407ÿÿÿÿ.6171695
    ÿÿÿÿÿhigh#2ÿÿ|ÿÿÿ.3906223ÿÿÿ.3862248ÿÿÿÿÿ1.01ÿÿÿ0.312ÿÿÿÿ-.3663644ÿÿÿÿ1.147609
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿ_consÿ|ÿÿ-1.010555ÿÿÿ.0669293ÿÿÿ-15.10ÿÿÿ0.000ÿÿÿÿ-1.141734ÿÿÿ-.8793765
    ------------------------------------------------------------------------------

    .ÿtempnameÿB

    .ÿmatrixÿdefineÿ`B'ÿ=ÿe(b)

    .ÿquietlyÿlogitÿparticipateÿi.level##i.raceÿ[fweight=count],ÿfrom(`B',ÿcopy)ÿiterate(0)ÿnolog

    .ÿmargins,ÿdydx(race)ÿat(ÿlevelÿ=ÿ(1(1)3)ÿ)ÿnoatlegend

    ConditionalÿmarginalÿeffectsÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ6,534
    ModelÿVCEÿÿÿÿ:ÿOIM

    Expressionÿÿÿ:ÿPr(participate),ÿpredict()
    dy/dxÿw.r.t.ÿ:ÿ2.race

    ------------------------------------------------------------------------------
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿÿÿÿÿÿÿDelta-method
    ÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿdy/dxÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
    -------------+----------------------------------------------------------------
    1.raceÿÿÿÿÿÿÿ|ÿÿ(baseÿoutcome)
    -------------+----------------------------------------------------------------
    2.raceÿÿÿÿÿÿÿ|
    ÿÿÿÿÿÿÿÿÿ_atÿ|
    ÿÿÿÿÿÿÿÿÿÿ1ÿÿ|ÿÿ-.1382996ÿÿÿ.0353403ÿÿÿÿ-3.91ÿÿÿ0.000ÿÿÿÿ-.2075654ÿÿÿ-.0690338
    ÿÿÿÿÿÿÿÿÿÿ2ÿÿ|ÿÿ-.2003079ÿÿÿ.0342183ÿÿÿÿ-5.85ÿÿÿ0.000ÿÿÿÿ-.2673745ÿÿÿ-.1332412
    ÿÿÿÿÿÿÿÿÿÿ3ÿÿ|ÿÿ-.1248158ÿÿÿ.0572725ÿÿÿÿ-2.18ÿÿÿ0.029ÿÿÿÿ-.2370679ÿÿÿ-.0125637
    ------------------------------------------------------------------------------
    Note:ÿdy/dxÿforÿfactorÿlevelsÿisÿtheÿdiscreteÿchangeÿfromÿtheÿbaseÿlevel.

    .ÿ
    .ÿexit

    endÿofÿdo-file


    .

    Comment


    • #3
      Dear Joseph,

      thank you for your help. I was able to replicate you suggestion with my data and the all covariates (actually there are 12 in total).
      It does not make a major difference, like you said.


      How do I decide if I should go for firthlogit? Fitstat is not available after using firthlogit?
      Can I run postestimations like margins after firthlogit without worries?

      Best regards

      Sara

      Comment


      • #4
        Originally posted by Sara Reiter View Post
        How do I decide if I should go for firthlogit?
        Opinions vary. I'm not aware of any universally accepted guidance.

        Fitstat is not available after using firthlogit?
        I don't know what Fitstat is, sorry.

        Can I run postestimations like margins after firthlogit without worries?
        No. I would trust margins after firthlogit only under circumstances where I would use logit, itself.

        Comment

        Working...
        X