Conditional logistic regression

Julia Granerod

Join Date: Nov 2018

Posts: 4
#1

Conditional logistic regression

29 May 2019, 05:24

Hi!
I am new to this forum and looking for some advice please. I have run a conditional logistic regression analysis using discrete choice data. The coefficients from the output will be used to input into a separate analysis. For various reasons, there are two further things I want to try but am unsure if I can do this in STATA and if so, how. Any help would be appreciated.

My initial command looked something like this:

clogit choice var1_low var1_med var1_high var1_vhigh
var2_low var2_med var2_high var2_vhigh var3_low var3_med var3_high var3_vhigh, group(obsid)
I want to rerun the regression analysis with two levels of the categorical variable held constant - and re-estimate the other values based on this. So for example, if I have low, medium, high, and very high as my levels, initially I used very high as my reference and estimated coefficients for the other levels of the variable. Is it possible to hold low and medium constant and estimate coefficients for high and very high?

Also it is possible in the regression to explore rather than having coefficient weights for each categorical level, to fit an exponential function between the best and worst levels for each variable? If so, how?

Thanks in advance!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

29 May 2019, 12:34

Regarding #1, to the extent I understand your question, you can do it, but first you must revise your -clogit- command to the simpler version relying on factor-variable notation. This means eliminating the var1_low var1_med var1_high and var1_vhigh, etc. indicator variables. Instead, you need to have just var1, which takes on four numeric values: 0 = low, 1 = med, 2 = high and 3 = vhigh. (Similarly for var2 and var3). With that you revise your original regression to:

Code:

clogit choice i.var1 i.var2 i.var3, group(obsid)

With that notation, Stata will automatically use the lowest values of var1, var2, and var3 as the reference categories for representing var1, var2, and var3 as discrete variables.

Now, if, for example, you want to use high as the reference category for var1 and med as the base category for var2 (and stick with low as the base category for var3) you would do it as:

Code:

clogit choice ib2.var1 ib1.var2 i.var3, group(obsid)

The ib2 prefix on var1 tells Stata to use 2 (i.e. high) as the reference category for var1 in this analysis. Similarly, ib1 tells Stata to use 1 (i.e. med) as the reference category for var2.

Read -help fvvarlist- for more information on factor variable notation.

But I dont' understand what you mean by "hold low and medium constant and estimate coefficients for high and very high?" Can you explain that more. You can only have one reference category in a discrete variable. Did you mean that you want to combine low and medium values into a single "non-high" category?

Also it is possible in the regression to explore rather than having coefficient weights for each categorical level, to fit an exponential function between the best and worst levels for each variable? If so, how?

This one I don't understand at all. Low, medium, high, and very high are, at best, levels of an ordinal variable, and the exponential function requires a ratio-level variable in order to produce sensible results. You are (at least) two steps away from that on the hierarchy. Can you explain better what you are trying to accomplish here?
1 like
Comment
Julia Granerod

Join Date: Nov 2018

Posts: 4
#3

31 May 2019, 04:54

Thanks so much for your reply, much appreciated.

I realise on reading your answer that my second query didn’t really make sense.

Let me put it another way –

I have used the coefficients produced from the conditional logistic regression to produce weights for another analysis. I am also considering other options for assigning these weights.

For the seizures variable for example, I have levels “low,” “medium,” “high,” and “very high.” I was wondering whether, based on a weight of 0 for “low” and -0.245 for “very high,” we can estimate weights for “medium” and “high” using an exponential function? Is this possible?

Low Medium High Very high

Seizures 0 -0.245
Comment

Julia Granerod

Join Date: Nov 2018
Posts: 4

31 May 2019, 05:51

About the first query –

Data is structured as such:

Patid	Obsid	Seiz_low	Seiz_med	Seiz_high	Seiz_vhigh	Choice	Const
1	1	0	0	1	0	0	1
1	1	0	1	0	0	1	0
1	2	0	0	0	1	1	1
1	2	1	0	0	0	0	0
2	3	0	0	1	0	0	1
2	3	0	1	0	0	1	0
2	4	0	0	0	1	0	1
2	4	1	0	0	0	1	0
3	5	0	0	1	0	1	1
3	5	0	1	0	0	0	0
3	6	0	0	0	1	0	1
3	6	1	0	0	0	1	0

Variables defined as follows:

Patid = identification variable unique to respondent

Obsid = indicates each unique choice made by respondent on survey

Choice = dependent variable, indicating choice of treatment A or B

Const = indicates whether row of data represents treatment A or B in a choice set

The conditional logistic regression command looked something like this –

clogit choice seiz_high seiz_med seiz_low const, group(obsid)

Given the data structure can you still combine two levels, say “low” and “medium” and make these the baseline? Or could we set the coefficients say for “low” and “medium” as 1 in the regression and estimate the coefficients for “high” and “very high” based on this?

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

31 May 2019, 11:17

Given the data structure can you still combine two levels, say “low” and “medium” and make these the baseline? Or could we set the coefficients say for “low” and “medium” as 1 in the regression and estimate the coefficients for “high” and “very high” based on this?

Well, if you want to stay with this data structure, you could do

Code:

clogit choice seiz_high seiz_vhigh, group(patid)

and that would treat both low and medium as a combined reference category.

In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

When asking for help with code, always show example data. When showing example data, always use -dataex-.
Comment
Julia Granerod

Join Date: Nov 2018

Posts: 4
#6

31 May 2019, 14:05

That is so helpful - thank you so much!

And I will do as you advise next time.

Also, did you have any thoughts as well on the previous question I elaborated on regarding exponential functions?

Thanks again!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

31 May 2019, 18:59

I still don't get what you want to do with regard to the exponential weighting. You are starting from a weighting of 0 and going to -0.245, but exponential functions never take on zero or negative values. So I really don't get what you mean here.
Comment
ashraf abugroun

Join Date: Nov 2018

Posts: 37
#8

31 May 2019, 20:40

I wonder after using PSMATCH2 for 1:1 nearest neaighbour matching, what would be the default grouping variable generated by PSmach2 that can be use in conditionla logistic.
Comment

	Low	Medium	High	Very high
Seizures	0			-0.245

Announcement