The odds ratio, SD, CI is too small?

Bader Bin Adwan

Join Date: Apr 2021

Posts: 91
#1

The odds ratio, SD, CI is too small?

03 Apr 2021, 17:35

Hello,

I am doing a logistic to check the odds ratio between binary outcome and continuous variable, the results that I got is very small. For example, the odds ratio is 8.24e-08 and SD is 4.25e-08 where the 95%CI is 6.55e-11 - 0.0001208?

Is there any better way to solve this issue apart from transform the variable to log? And if the only way to solve it is by log transform it, how can I interpret the results?

Thanks
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4464
#2

03 Apr 2021, 17:55

what is the range of the continuous variable? does a one unit difference in the continuous variable matter in the real world? consider re-scaling by dividing by something (10, 100, 10000????) - without further information it is very hard to give better advice - please read the FAQ which has very good advice on asking good questions
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#3

03 Apr 2021, 17:59

The null value for an odds ratio is 1, not 0. So an odds ratio of 8.24e-08 is not "small" in the sense of almost no effect: it is enormous, but in a reduced probability direction. It says that even small increases in your predictor variable leads to almost no probability of a positive outcome. There are a number of possibilities here. One is that you are modeling an outcome which is almost always 0 but happens to be 1 at some very small values of the continuous predictor. Another possibility is that the scale of the continuous variable is inappropriate for the analysis. And if that is the case, a log transform might make matters worse, not better. Another possibility is some data error(s) that produced a highly influential observation that is distorting everything. To give more concrete advice it would be necessary to see example data, along with descriptive statistics for the outcome and continuous predictor. It would also be helpful to look at the graph produced by -lowess outcome continuous_var, logit-.
Comment

Bader Bin Adwan

Join Date: Apr 2021
Posts: 91

03 Apr 2021, 18:26

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte group float fps
1 .1491078
1 .0410539
1 .0961356
1 .0440432
1 .0485944
1 .3304636
1 .0840391
1 .0613633
1 .0526481
1 .1971922
1 .0738812
1 .1518053
0 .0542198
1 .0945191
1 .1400147
1 .0630852
1 .0508285
1 .0680228
1  .083058
1 .0921024
1 .0557303
1 .0480706
1  .093758
1 .0895595
1 .0607143
1 .0502698
1 .0596198
1 .0758054
1 .0994444
0 .1289528
1 .1219793
1 .1776514
1 .0722118
1 .0647861
1 .0549093
1 .0696798
1 .1196871
1 .1655451
0 .1120318
1 .0325733
1 .0385409
0 .2013027
1 .0726375
1 .0662749
1 .0774922
1 .0735908
1 .0863183
1  .066577
1 .0743369
1 .1016949
1 .1229856
0 .0681034
1 .0626035
1  .156962
1 .0557873
1 .1065125
1 .1135612
1 .1207563
1 .0864312
1 .1162011
1 .0905537
1 .0728972
1 .1347974
1 .1386651
0  .098441
1 .0577838
1 .1196481
1 .1064916
1 .0630386
0 .0715643
0 .0433135
0 .0718835
1 .0526946
1 .1023371
1 .0632832
0 .0711144
1 .0879387
1 .1018519
1 .0956392
0 .0517711
0 .1035422
0 .0705446
1 .1046987
1 .2906195
0 .0732839
1  .050216
1 .0872954
1 .1492958
1 .0494137
1 .0625889
0 .1436301
1 .0839552
0 .0994152
1  .066092
1 .1069559
0 .0853041
0 .0585034
1 .1407767
1  .058663
0 .1839599
end

Here is a sample of 100 obs of the data. The outcome is group and it is 0,1. The fps variable range is [.0325733,1.645762].

The command is only allow for 100 observations, and when I try it, the results is still unusual but not as with the original data!

Many thanks for your help.

Last edited by Bader Bin Adwan; 03 Apr 2021, 18:29.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

03 Apr 2021, 18:41

Well, from these 100 observations, nothing seems glaringly wrong. The outcome variable is a little lopsided, 19 0's and 81 1's, but that's certainly not something that -logistic- can't handle. The fps range from .03 to 1.65 is certainly appropriate, so this is not a scaling issue. I suppose it is possible that in the rest of the data, there are almost no more 1's, and the values of fps are larger than observed in the example. That could produce some problems, especially if the data set is large.

I ran the lowess plot on the example data: it has a V-shape, which suggests that your logistic model is probably not appropriate to this data, but, again, I don't see anything in it that would account for an OR that close to zero.

A somewhat easier plot to interpret would be: -dotplot fps, over(group)-. It's possible that will show that the data is pathological when applied to the whole data set: in the example data it looks OK.
Comment
Bader Bin Adwan

Join Date: Apr 2021

Posts: 91
#6

03 Apr 2021, 18:54

The data is not large, it has around 700 Obs. In group there are 220 1's and 480 0's. The min and max for fps if group ==1 is 0.032 and 0.44, where it was 0.040 and 1.64 if group==0.

Here is the dotplot.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#7

03 Apr 2021, 19:03

Well, you clearly have an outlier with a very large fps value and group = 0. I suspect that data point is the culprit here. The question is: is that a data error--in which case you must fix it or remove it? Or is it correct data. If that's the case, a logistic model is simply not going to work for this data.
Comment
Bader Bin Adwan

Join Date: Apr 2021

Posts: 91
#8

03 Apr 2021, 19:16

I tried and removed that outlier and perform logistic again but the results did not change. When I initially log transform it, the results made sense. I am not sure what is the problem here exactly, guess the values are skewed!

Thanks
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#9

03 Apr 2021, 19:20

I tried and removed that outlier and perform logistic again but the results did not change.

I find that very hard to believe. Maybe it wouldn't entirely solve your problem, but I would expect the results to change a lot. From the looks of the dotplot, I would expect that with the outlier removed the OR would be just a small amount less than 1. Can you show the code and output?
Comment
Bader Bin Adwan

Join Date: Apr 2021

Posts: 91
#10

03 Apr 2021, 19:27

Here are the results!
Comment
Bader Bin Adwan

Join Date: Apr 2021

Posts: 91
#11

03 Apr 2021, 19:33

Here is the results without omitting the outlier, OBS=696 . The results does not changed at all !
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#12

03 Apr 2021, 19:40

OK, now I see it. With that one point removed, we have a scaling problem. A 1 unit change in fps is a change that is almost twice as large as the entire range of the data. The data do have group leaning towards 0 with increasing values of fps, but now when you imagine a unit change in fps, you are extrapolating way beyond the range of the data. It is not reasonable to use fps in its current scale in this model. I suggest you rescale fps by a factor of 10, so that in the new units it ranges from approximately 0 to approximately 5. If you do that, you will get an odds ratio of about 0.3, which is more sensible.
Comment
Bader Bin Adwan

Join Date: Apr 2021

Posts: 91
#13

03 Apr 2021, 19:46

So, if I used gen logfps= ln(fps) then run the logistic the odd became 0.16 with 95% 0.11-0.24, is that mean for 10 fold increase in fps the odds is 0.16? Or is there a better way (command) to rescale fps as you suggested?

Appreciate your support
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#14

03 Apr 2021, 22:42

Well, by using a log transformation you make it more complicated. Your proposed interpretation is way off base.

A 10 fold increase in fps means that logfps increases by log(10) = 2.3. An odds ratio of 0.16 corresponds to a logistic regression coefficient of -1.83. So the log odds decreases by 2.3*1.83 or 4.2. Which in turn means that the odds decreases by a factor of exp(4.2) = 0.015. I don't know if that's helpful or if anyone who hasn't gone through the analysis would find it understandable.

Is there a reason you don't want to just change the scale of fps by a factor of 10? Just -gen fps10 = 10*fps- and then do the logistic regression that way? That seems a lot simpler to me, and then you'll just be able to say that a 10 unit increase in fps is associated with an odds ratio of (whatever, probably something around 0.3).

Anyway, a key thing to remember here is that odds ratios for continuous predictors are slippery: they depend very sensitively on the functional form and scale of the variable! Which also means that you can't really understand such an odds ratio unless you know what the scale of the predictor variable is.
Comment

Announcement

The odds ratio, SD, CI is too small?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment