Firthlogit Problems

Benjamin Revell

Join Date: Feb 2024

Posts: 12
#1

Firthlogit Problems

14 Feb 2024, 10:21

Hi all,
I am currently trying to predict a dichotomous dependant variable (being a problem gambler) given 6 categorical explanatory variables from cross-sectional survey data. I am using firthlogit as it is a large sample size with a small number of positive outcomes for the dependant variable. I have run the firthlogit regression and now want to find the increased probability that someone is a problem gambler given they are (for example) Divorced.

Firstly, If I have a reference group for each category of each variable, does the margins command tell me the increased probability of them being a problem gambler when they are divorced relative to the reference group (married/cohabiting)
Secondly, I have tried using the - margins, dxdy (list of explanatory variables) - command after the firthlogit regression but this generates an output identical to the firthlogit regression. Is there a way to find the margins from firthlogit?
Finally, I would like my results to use Robust standard errors to avoid heteroskedasticity, is there a command for this? the usual , robust or vce(robust) provide me with errors

Many Thanks
Tags: None
Joseph Coveney

Join Date: Apr 2014

Posts: 4374
#2

14 Feb 2024, 19:25

Originally posted by Benjamin Revell View Post

Firstly, If I have a reference group for each category of each variable, does the margins command tell me the increased probability of them being a problem gambler when they are divorced relative to the reference group (married/cohabiting)

Does your predictor list include a marital-status × category interaction term for each of the six? (You don't show anything.) If not,then you'll get only marginal effects and not any categorical-predictorwise effect.

Secondly, I have tried using the - margins, dxdy (list of explanatory variables) - command after the firthlogit regression but this generates an output identical to the firthlogit regression. Is there a way to find the margins from firthlogit?

This has come up on the list before. firthlogit isn't designed to be used with margins, because the likelihood function is asymmetric in the typical use case for firthlogit, which renders the coefficient standard errors and their Wald test statistics and confidence intervals meaningless.

Nevertheless,you can avail yourself to the postestimation command via the official logit estimation command as the intermediary. I illustrate the kludge in a toy example below. Begin at the "Begin here" comment; the stuff above is just to set up an illustrative dataset.

Code:

version 18.0 quietly sysuse auto, clear summarize mpg, meanonly generate byte divorced = mpg > r(mean) summarize headroom, meanonly generate byte some_category = headroom > r(mean) rename foreign problem_gambler * * Begin here * firthlogit problem_gambler i.divorced##i.some_category, nolog tempname B matrix define `B' = e(b) logit problem_gambler i.divorced##i.some_category, asis from(`B', copy) iterate(0) margins divorced#some_category // By default, this provides the predicted probabilities margins some_category, dydx(divorced) // Risk differences // Aid to interpretation: version 16.1: table divorced some_category, contents(mean problem_gambler) format(%04.2f) exit

This kludge is hinted at in the auxiliary file that accompanies the download and installation of firthlogit which is a user-written command from SSC.

Finally, I would like my results to use Robust standard errors to avoid heteroskedasticity, is there a command for this? the usual , robust or vce(robust) provide me with errors

This comes up like clockwork on the list. Google firthlogit AND (robust OR cluster) AND site:statalist.org for the answer.
2 likes
Comment
Benjamin Revell

Join Date: Feb 2024

Posts: 12
#3

15 Feb 2024, 13:11

Hi James,

Really appreciate the concise explanation, sorry for the vagueness in the original post.

To clarify my command is:

firthlogit PROBLEMGAMBLER Lowincome ib2.Sex ib7.ag16g10 i.Educ2 i.origin2 i.maritalstatus

where Problem gambler is my binary dependant variable (0,1)
and Lowincome is binary, Sex is binary, and the rest of the explanatory variables have more than two categories.

The intermediary commands worked great, but I am struggling with the interpretation. If I operate margins after using your previous code for logit as an intermediary:

margins, dydx(Lowincome ib2.Sex ib7.ag16g10 i.Educ2 i.origin2 i.maritalstatus)

Below the output, it suggests that the dy/dx is the discrete change relative to the base level (the reference group).

Does this mean that a 0.1 dy/dx value for the age category (16-24) tells me there is a 10% increased predicted probability of being a problem gambler (dependant variable) when in the (16-24) category relative to the reference category (65-74) of that explanatory variable? (age)
This is what I understand a marginal effect describes and is what I am trying to achieve
Also, I see that margins ideally shouldn't be used due to the separation, so is there an alternate better method available for post estimation that can be interpreted similar to how I am trying to?

Many thanks!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#4

15 Feb 2024, 14:17

Does this mean that a 0.1 dy/dx value for the age category (16-24) tells me there is a 10% increased predicted probability of being a problem gambler (dependant variable) when in the (16-24) category relative to the reference category (65-74) of that explanatory variable? (age)

Almost, but not quite. In fact, probably what you meant by this is correct, but you said it badly.

Following -logit-, with -margins age_category, dydx(some_variable)- giving a value of 0.1 in the age category 16-24 tells you that there is a 10 percentage point increased probability of being a problem gambler in the 16-24 category relative to the reference category (65-74).

The distinction between percent and percentage points is crucial, though people misuse the terms commonly. If the probability of being a problem gambler in the reference category is 50%, a 10% increase would bring it to 55%, whereas a 10 percentage point increase is 60%. The -margins, dydx()- result you describe should be interpreted as the latter, not the former. And this interpretation is a correct implementation of the concept of marginal effect.
1 like
Comment
Benjamin Revell

Join Date: Feb 2024

Posts: 12
#5

16 Feb 2024, 02:20

Originally posted by Clyde Schechter View Post

Almost, but not quite. In fact, probably what you meant by this is correct, but you said it badly.

Following -logit-, with -margins age_category, dydx(some_variable)- giving a value of 0.1 in the age category 16-24 tells you that there is a 10 percentage point increased probability of being a problem gambler in the 16-24 category relative to the reference category (65-74).

The distinction between percent and percentage points is crucial, though people misuse the terms commonly. If the probability of being a problem gambler in the reference category is 50%, a 10% increase would bring it to 55%, whereas a 10 percentage point increase is 60%. The -margins, dydx()- result you describe should be interpreted as the latter, not the former. And this interpretation is a correct implementation of the concept of marginal effect.

Thanks for confirming the interpretation Clyde. really appreciate it!
Comment

Announcement

Firthlogit Problems

Comment

Comment

Comment

Comment