interpretation of marginal effects in a binary model

Chloe Williamson

Join Date: Feb 2024

Posts: 13
#1

interpretation of marginal effects in a binary model

10 Apr 2024, 13:02

Hi,
I have a quick (and hopefully easy) question to ask about the interpretation of my margins command. I have a model where I am trying to deduce the association of different types of shocks on the likelihood of child marriage. My dependent variable is a binary variable, child bride, taking the value 0 if they don't get married between two waves of data, and 1 if they do.

I have run a logit model in stata with the explanatory variables as shock101, shock102, shock112, shock113, and shock114. Each shock is also binary and has the value 0 if it didn't occur and 1 if it did occur.

After this, I have run the command 'margins, dydx(*) at(shock101=0 shock102=0 shock112=0 shock113=0 shock114=0)'

I think that this should give the marginal effect of each shock going from 0 to 1 (ie. of the shock happening) but I am not entirely sure.

If I have a marginal effect of shock101 as 0.246, would it be correct to say that 'experiencing shock101 is associated with a 24.6% higher likelihood of marriage'?

Thank you in advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29963
#2

10 Apr 2024, 13:37

If I have a marginal effect of shock101 as 0.246, would it be correct to say that 'experiencing shock101 is associated with a 24.6% higher likelihood of marriage'?

The marginal effects are in the probability metric and they are additive. Moreover, your -at()- option has constrained the values of all the other shocks to zero, so this marginal effect would not be applicable if any of them were 1.

It would be correct to say that, conditional on not experiencing shocks 102, 112, 113, and 114, the probability of experiencing child marriage among those experiencing shock 101 is 24.6 percentage points higher than for those not experiencing shock 101.
1 like
Comment
Chloe Williamson

Join Date: Feb 2024

Posts: 13
#3

11 Apr 2024, 04:23

Originally posted by Clyde Schechter View Post

The marginal effects are in the probability metric and they are additive. Moreover, your -at()- option has constrained the values of all the other shocks to zero, so this marginal effect would not be applicable if any of them were 1.

It would be correct to say that, conditional on not experiencing shocks 102, 112, 113, and 114, the probability of experiencing child marriage among those experiencing shock 101 is 24.6 percentage points higher than for those not experiencing shock 101.

Hi, thank you for getting back to me!

Would you say that using average marginal effects would it be better then? ideally I would like to compare the results from the logit model to the OLS model - do you have any recommendations as to which presentation of the logit results would be best for this?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29963
#4

11 Apr 2024, 09:25

Well, your goal isn't entirely clear to me. Comparing a linear probability model (which is what I imagine you are referring to as an OLS model) to a logit model with the same variables is not commonly done. And in what respects do you want to compare them? Is the goal of your modeling to build a predictive model? Or are you trying to analyze some estimate of causal effects?

Then there are some technical issues here. In a linear probability model -regress outcome shock101 shock102 shock112 shock113 shock114-, if you ran -margins, dydx(*) at(whatever)-, you will get the same results, regardless of what you specify in the -at()- option. That's because it's a linear model with no interaction terms, so the coefficients are themselves the marginal effects and the linear relationship guarantees that the marginal effects are constant. In your logistic model with the same variables, the non-linearity of the logit link function causes the marginal effects (which are not the coefficients, nor even directly calculable from them independently of the data) do vary with the values of the model explanatory variables. From a purely mathematical perspective, there exists some -at()- condition, not necessarily a plausible or sensible one, where the marginal effect of shock101 in the logit model matches the shock101 marginal effect in the linear model, but I don't know what that tells you. I suppose it means nothing.

The point is that in the logistic model, the marginal effects are always conditional on something--there are no unconditional marginal effects in this model. Which condition is worth focusing on depends on substantive and contextual matters that you, or others in your field, are better equipped to advise you on than I am. The choice of the most sensible condition to focus on is not a purely statistical issue. If you are confident that your sample is highly representative of what ever population you want to draw a general conclusion about, and if the conclusion you want to draw is about a marginal effect of shock101, then, yes, average marginal effect is likely your best estimator.

But if the purpose of your modeling is to build a predictive model, then I would not base my model selection on any kind of marginal effects calculation. I would look at the correspondence between predicted and observed outcomes, and which model gets that better (model calibration) and which model is better able to distinguish those who undergo child marriage from those who don't (model discrimination).
Comment
Chloe Williamson

Join Date: Feb 2024

Posts: 13
#5

12 Apr 2024, 07:08

Originally posted by Clyde Schechter View Post

Well, your goal isn't entirely clear to me. Comparing a linear probability model (which is what I imagine you are referring to as an OLS model) to a logit model with the same variables is not commonly done. And in what respects do you want to compare them? Is the goal of your modeling to build a predictive model? Or are you trying to analyze some estimate of causal effects?

Then there are some technical issues here. In a linear probability model -regress outcome shock101 shock102 shock112 shock113 shock114-, if you ran -margins, dydx(*) at(whatever)-, you will get the same results, regardless of what you specify in the -at()- option. That's because it's a linear model with no interaction terms, so the coefficients are themselves the marginal effects and the linear relationship guarantees that the marginal effects are constant. In your logistic model with the same variables, the non-linearity of the logit link function causes the marginal effects (which are not the coefficients, nor even directly calculable from them independently of the data) do vary with the values of the model explanatory variables. From a purely mathematical perspective, there exists some -at()- condition, not necessarily a plausible or sensible one, where the marginal effect of shock101 in the logit model matches the shock101 marginal effect in the linear model, but I don't know what that tells you. I suppose it means nothing.

The point is that in the logistic model, the marginal effects are always conditional on something--there are no unconditional marginal effects in this model. Which condition is worth focusing on depends on substantive and contextual matters that you, or others in your field, are better equipped to advise you on than I am. The choice of the most sensible condition to focus on is not a purely statistical issue. If you are confident that your sample is highly representative of what ever population you want to draw a general conclusion about, and if the conclusion you want to draw is about a marginal effect of shock101, then, yes, average marginal effect is likely your best estimator.

But if the purpose of your modeling is to build a predictive model, then I would not base my model selection on any kind of marginal effects calculation. I would look at the correspondence between predicted and observed outcomes, and which model gets that better (model calibration) and which model is better able to distinguish those who undergo child marriage from those who don't (model discrimination).

Thank you so much - what you're saying makes a lot of sense. I think the term comparison was definitely misleading - I was told to use both the linear probability model and the logit model to see if similar conclusions can be drawn I think but I don't need to directly compare the two.
Thank you for all of the help though and I'll have a look into average marginal effects instead.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2121
#6

12 Apr 2024, 16:56

In economics it's common to compare the coefficients in an LPM to comparable marginal effects. If the shock variables are not exhaustive and mutually exclusive then the estimated will be different, but often close. If the shock variables form a partition then the linear model and any nonlinear model will give identical estimates. But not otherwise -- although I suspect they would be close. It does seem like multiple shocks can occur at once.

To compare the the linear model coefficients to an average marginal effect, it would be

Code:

logit y i.shock101 i.shock102 i.shock112 i.shock113 i.shock114 margins, dydx(shock101)

The other shocks average averaged out as shock101 goes from zero to one. That is comparable to a linear model coefficient.

If it is not logically possible for some combinations to occur, the conceptual problem is more difficult -- and the calculations.
Comment

Announcement

interpretation of marginal effects in a binary model

Comment

Comment

Comment

Comment

Comment