Logistic regression models comparison

Sam Lee

Join Date: Apr 2020

Posts: 19
#1

Logistic regression models comparison

01 Apr 2020, 17:35

Try to run the same logistic regression model for 2 different populations and to model each one separately. How should I start (step-by-step)? and how should I compare the results across the two models?

Thank you
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29813
#2

01 Apr 2020, 18:26

Here's an example of how to do this:

Code:

sysuse auto, clear logistic foreign mpg headroom if rep78 == 5 estimates store rep78_5 logistic foreign mpg headroom if rep78 == 4 estimates store rep78_4 suest rep78_5 rep78_4, coefl lincom _b[rep78_5_foreign:mpg] - _b[rep78_4_foreign:mpg]

Note also that another way of doing this is with interaction terms, which is in many respects simpler and more flexible in that you can constrain coefficients of variables you are not particularly interested in and don't think have different effects in the two populations to be equal. This gives a more efficient comparison of the variables you are interested in, if your assumptions are correct.

If you want advise more tailored to your actual data set and logistic regressions, you can post back with example data (use -dataex-) and your logistic regression commands (use code delimiters).
If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
2 likes
Comment
Sam Lee

Join Date: Apr 2020

Posts: 19
#3

01 Apr 2020, 21:01

Thank you for your help. Much appreciated.

1. Can I still use this code with categorical variables? for example, if mpg and headroom were categorical variables.

2. How should I interpret this result?

. lincom _b[rep78_5_foreign:mpg] - _b[rep78_4_foreign:mpg]

( 1) [rep78_5_foreign]mpg - [rep78_4_foreign]mpg = 0

------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
(1) | -.4652467 .1714876 -2.71 0.007 -.8013562 -.1291372
------------------------------------------------------------------------------

3. Could you please show me the example with interaction terms?

Thank you so much!!!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29813
#4

01 Apr 2020, 21:33

1. Yes. Read -help fvvarlist- to learn about factor variable notation, and use it in your -logistic- commands.

2. The estimated difference between the coefficient of mpg in the rep78 = 5 and rep78 = 4 populations is -0.465... with a standard error of 0.171..., 95% CI -0.801... to -0.129... If you are interested in a test of the null hypothesis that they are equal, the p-value is also shown in the output there.

3.

Code:

sysuse auto, clear logistic foreign i.rep78##(c.mpg c.headroom) if inlist(rep78, 4, 5) margins rep78, dydx(mpg headroom) predict(xb)

That's a direct replication of the method shown in #2. If you wanted however to stipulate that the effect of headroom is the same regardless of the value of rep78 (which doesn't seem likely in this actual example, but, just to illustrate the code...)

Code:

sysuse auto, clear logistic foreign i.rep78##c.mpg c.headroom if inlist(rep78, 4, 5) margins rep78, dydx(mpg headroom) predict(xb)

By leaving headroom out of the interaction term, you constrain it to have the same coefficient regardless of rep78, so you get different values for the coefficients of mpg that are consistent with that constraint. The -suest- method in #2 is not capable of this.
Comment
Wizaso Munthali

Join Date: Feb 2022

Posts: 7
#5

03 Feb 2022, 08:56

Hello Clyde Schechter, your post was very helpful to me, similarly, I need to compare two logit models, in my case using choice (best-worst scaling/ maximum difference) data. My model command line looks like this "logit Choice v1-v28". v1 to v28 are the attributes from which a respondent picks a best and worst choice from a set of 5. I run the comparison test you show in #2 smoothly, like this

logit Choice v1-v28 if Set == 1
estimates store Set_1

logit Choice v1-v28 if Set == 2
estimates store Set_2

suest Set_1 Set_2, coefl

lincom _b[Set_1_Choice:v1] - _b[Set_2_Choice:v1]
lincom _b[Set_1_Choice:v2] - _b[Set_2_Choice:v2]
etc...

What I'd like to know is:
1. Is this comparison test suitable for logit regression using best-worst data?
2. How can I run the test such that it picks all 28 attribute variables at one time, instead of me testing each set individually?
3. What is this method of testing called, could you refer me to some text material that I can use to write up on it, and to help me follow up #4 part 3.
I'm not a statistician, and only started programming recently, so #4 was quite hard for me to follow.

Thank you.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29813
#6

03 Feb 2022, 14:18

1. Is this comparison test suitable for logit regression using best-worst data?

I don't know. I don't work with best-worst data and I can't pretend to understand the implications. I can tell you that this approach is generically suitable for comparing coefficients across logistic regresions. However, I cannot say whether comparison of coefficients across logistic regressions itself is sensible for choice data. It might be more sensible to do a single multivariate logistic regression. I don't know.

2. How can I run the test such that it picks all 28 attribute variables at one time, instead of me testing each set individually?

I'm not sure I understand what you are asking here. If you mean that you would like to automate this process rather than having to write out 28 commands:

Code:

forvalues i = 1/28 { lincom _b[Set_1_Choice:v`i'] - _b[Set_2_Choice:v`i'] }

will do that.

3. What is this method of testing called, could you refer me to some text material that I can use to write up on it, and to help me follow up #4 part 3.

What we are talking about here is generically referred to as effect-modification, or moderation (those are synonyms). It may have other names in some specific contexts. To better understand #4 you might benefit from reading https://www3.nd.edu/~rwilliam/stats2/l53.pdf and https://www3.nd.edu/~rwilliam/stats/Margins01.pdf.
1 like
Comment
Wizaso Munthali

Join Date: Feb 2022

Posts: 7
#7

03 Feb 2022, 17:13

Yes, that is what I meant for question number 2.
Thank you very much, this is very helpful!
Comment
Wizaso Munthali

Join Date: Feb 2022

Posts: 7
#8

07 Feb 2022, 12:26

Hello, could you confirm if this test is also referred to as the generalised Hausman test? Thanks.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29813
#9

07 Feb 2022, 12:49

The -suest- method shown in this thread is a generic method for testing whether the coefficient of a given variable or several variables in a specific regression model differs across certain subsets of the population.

In my discipline, epidemiology, references to Hausman tests (generalized or not) are extremely infrequent, almost non-existent. My understanding of the term "Hausman test" is that it tests the equality of coefficients of all of the explanatory variables in two different regression models involving the same explanatory and outcome variables, the different models being random effects vs fixed effects, and is used to choose between those models. It can be, and often is, implemented using -suest-. So, in that sense, one might refer to -suest- itself as a generalization of the Hausman test, but I have never seen or heard it called that. But perhaps in other disciplines that term is used.
2 likes
Comment
Wizaso Munthali

Join Date: Feb 2022

Posts: 7
#10

07 Feb 2022, 20:25

Understood, thank you!
Comment
Valerie LI

Join Date: Jul 2021

Posts: 2
#11

19 Apr 2022, 15:45

Hi Clyde, this is very helpful. And the code works on comparing coeff. from logistic models. however, if my logistic model has "vce (cluster firmID)", the -suest- odes not work. is there a way to
specify the cluster() option with suest.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29813
#12

19 Apr 2022, 15:53

Yes. You run the original models without -vce(cluster firmID)-. Then you invoke -suest- and specify the -vce(cluster firmID)- option on the -suest- command itself.
Comment
Valerie LI

Join Date: Jul 2021

Posts: 2
#13

19 Apr 2022, 16:06

Thanks so much for your quick reply. It works!
Comment

Announcement

Logistic regression models comparison

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment