Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Firthlogit Problems

    Hi all,
    I am currently trying to predict a dichotomous dependant variable (being a problem gambler) given 6 categorical explanatory variables from cross-sectional survey data. I am using firthlogit as it is a large sample size with a small number of positive outcomes for the dependant variable. I have run the firthlogit regression and now want to find the increased probability that someone is a problem gambler given they are (for example) Divorced.

    Firstly, If I have a reference group for each category of each variable, does the margins command tell me the increased probability of them being a problem gambler when they are divorced relative to the reference group (married/cohabiting)
    Secondly, I have tried using the - margins, dxdy (list of explanatory variables) - command after the firthlogit regression but this generates an output identical to the firthlogit regression. Is there a way to find the margins from firthlogit?
    Finally, I would like my results to use Robust standard errors to avoid heteroskedasticity, is there a command for this? the usual , robust or vce(robust) provide me with errors

    Many Thanks

  • #2
    Originally posted by Benjamin Revell View Post
    Firstly, If I have a reference group for each category of each variable, does the margins command tell me the increased probability of them being a problem gambler when they are divorced relative to the reference group (married/cohabiting)
    Does your predictor list include a marital-status × category interaction term for each of the six? (You don't show anything.) If not,then you'll get only marginal effects and not any categorical-predictorwise effect.

    Secondly, I have tried using the - margins, dxdy (list of explanatory variables) - command after the firthlogit regression but this generates an output identical to the firthlogit regression. Is there a way to find the margins from firthlogit?
    This has come up on the list before. firthlogit isn't designed to be used with margins, because the likelihood function is asymmetric in the typical use case for firthlogit, which renders the coefficient standard errors and their Wald test statistics and confidence intervals meaningless.

    Nevertheless,you can avail yourself to the postestimation command via the official logit estimation command as the intermediary. I illustrate the kludge in a toy example below. Begin at the "Begin here" comment; the stuff above is just to set up an illustrative dataset.
    Code:
    version 18.0
    
    quietly sysuse auto, clear
    
    summarize mpg, meanonly
    generate byte divorced = mpg > r(mean)
    
    summarize headroom, meanonly
    generate byte some_category = headroom > r(mean)
    
    rename foreign problem_gambler
    
    *
    * Begin here
    *
    firthlogit problem_gambler i.divorced##i.some_category, nolog
    tempname B
    matrix define `B' = e(b)
    
    logit problem_gambler i.divorced##i.some_category, asis from(`B', copy) iterate(0)
    margins divorced#some_category // By default, this provides the predicted probabilities
    margins some_category, dydx(divorced) // Risk differences
    
    // Aid to interpretation:
    version 16.1: table divorced some_category, contents(mean problem_gambler) format(%04.2f)
    
    exit
    This kludge is hinted at in the auxiliary file that accompanies the download and installation of firthlogit which is a user-written command from SSC.

    Finally, I would like my results to use Robust standard errors to avoid heteroskedasticity, is there a command for this? the usual , robust or vce(robust) provide me with errors
    This comes up like clockwork on the list. Google firthlogit AND (robust OR cluster) AND site:statalist.org for the answer.

    Comment


    • #3
      Hi James,

      Really appreciate the concise explanation, sorry for the vagueness in the original post.

      To clarify my command is:

      firthlogit PROBLEMGAMBLER Lowincome ib2.Sex ib7.ag16g10 i.Educ2 i.origin2 i.maritalstatus

      where Problem gambler is my binary dependant variable (0,1)
      and Lowincome is binary, Sex is binary, and the rest of the explanatory variables have more than two categories.

      The intermediary commands worked great, but I am struggling with the interpretation. If I operate margins after using your previous code for logit as an intermediary:

      margins, dydx(Lowincome ib2.Sex ib7.ag16g10 i.Educ2 i.origin2 i.maritalstatus)

      Below the output, it suggests that the dy/dx is the discrete change relative to the base level (the reference group).

      Does this mean that a 0.1 dy/dx value for the age category (16-24) tells me there is a 10% increased predicted probability of being a problem gambler (dependant variable) when in the (16-24) category relative to the reference category (65-74) of that explanatory variable? (age)
      This is what I understand a marginal effect describes and is what I am trying to achieve
      Also, I see that margins ideally shouldn't be used due to the separation, so is there an alternate better method available for post estimation that can be interpreted similar to how I am trying to?

      Many thanks!

      Comment


      • #4
        Does this mean that a 0.1 dy/dx value for the age category (16-24) tells me there is a 10% increased predicted probability of being a problem gambler (dependant variable) when in the (16-24) category relative to the reference category (65-74) of that explanatory variable? (age)
        Almost, but not quite. In fact, probably what you meant by this is correct, but you said it badly.

        Following -logit-, with -margins age_category, dydx(some_variable)- giving a value of 0.1 in the age category 16-24 tells you that there is a 10 percentage point increased probability of being a problem gambler in the 16-24 category relative to the reference category (65-74).

        The distinction between percent and percentage points is crucial, though people misuse the terms commonly. If the probability of being a problem gambler in the reference category is 50%, a 10% increase would bring it to 55%, whereas a 10 percentage point increase is 60%. The -margins, dydx()- result you describe should be interpreted as the latter, not the former. And this interpretation is a correct implementation of the concept of marginal effect.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Almost, but not quite. In fact, probably what you meant by this is correct, but you said it badly.

          Following -logit-, with -margins age_category, dydx(some_variable)- giving a value of 0.1 in the age category 16-24 tells you that there is a 10 percentage point increased probability of being a problem gambler in the 16-24 category relative to the reference category (65-74).

          The distinction between percent and percentage points is crucial, though people misuse the terms commonly. If the probability of being a problem gambler in the reference category is 50%, a 10% increase would bring it to 55%, whereas a 10 percentage point increase is 60%. The -margins, dydx()- result you describe should be interpreted as the latter, not the former. And this interpretation is a correct implementation of the concept of marginal effect.
          Thanks for confirming the interpretation Clyde. really appreciate it!

          Comment

          Working...
          X