Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Measures of fit for a logistic regression model

    Hi all,

    I am conducting a study where I have measured 2 variables. I know that theoretically, there is a causal relation between the two variables, such that one of the variables (cause C), leads to the other variable (effect, E). I have measured C at time point T1 (Ct1), whereas E was measured at two timepoints, T1 and T2 (Et1 and Et2, respectively). C is categorical ordinal variable containing 4 levels/categories and E is a binary variable.

    I have 600 unique subject IDs total, and from each ID the variables Ct1, Et1 and Et2 were measured for 8 different body parts, yielding 4800 entries in total.

    I'm using logistic regression to calculate odds ratios for how C (measured at T1) predicts the development of E at T2, corrected for the presence of E at T1. Put in other words I want to see how C predicts development of new E at T2, corrected for preexisting E at T1.


    Code:
     . logistic Et2 i.Ct1 Et1,  vce(cluster ID)
    
    Logistic regression    Number of obs     =      4,800
        Wald chi2(4)      =     245.01
        Prob > chi2       =     0.0000
    Log pseudolikelihood = -386.54839    Pseudo R2         =     0.3717
    
    (Std. Err.    adjusted for 600 clusters in ID)
        
    Robust
    Et2  Odds Ratio   Std. Err.      z    P>z     [95% Conf. Interval]
        
    Ct1 
    1     3.198967   1.232098     3.02    0.003     1.503714    6.805409
    2     4.993883   3.840523     2.09    0.037      1.10618    22.54504
    3     55.21636   33.82345     6.55    0.000     16.62088    183.4348
    
    Et1    53.07789   20.83338    10.12    0.000     24.59303    114.5553
    _cons    .0118504   .0019137   -27.47    0.000     .0086352    .0162627
        
    Note: _cons estimates baseline odds.
    Despite a large sample, some of the cells have small numbers:

    Code:
    .    tab Ct1 Et2
    
                 Et2
    Ct1    0    1    Total
                
    0      4,438    67       4,505 
    1      198      38       236 
    2      22       8        30 
    3      6        23       29 
                
    Total  4,664    136      4,800
    My question is should I test the model fit when I am only including two predictors in the model?

    As far as I understand the Pearson Chi-Squared Goodness Test is problematic if there are few observations for some of the values of the predictor variable(s) and the Hosmer–Lemeshow Test are best with more than five predictors and works best with continuous predictors. So neither of the two test seems to be good in my case, but are there any other diagnostic tests I should run?

    Also, if you have any concerns regarding my approach in general, or if there is model testing I need to do, I appreciate if you let me know.

    Best wishes,
    Jane

  • #2
    I think that the concerns you have about Pearson chi square GOF test and the Hosmer-Lemeshow test refer to the fact that both produce test statistics that have approximate chi square sampling distributions, and that the approximation is not so good under the circumstances you face.

    That is all true. But, for my part, I have always felt that it is a category error to think of goodness of fit as a yes-no phenomenon. Both of these statistics, if you ignore the issue of their "significance" are clearly calculated as direct measures of the agreement between the data and the model predictions. They are reasonable definitions of goodness of fit under any circumstances. Indeed, if you ponder for a long time different ways might one assess the extent of agreement between the model predictions and the data, you don't come up with very many viable alternatives (although there are some). Whatever measures of fit you use, however, you cannot escape the fact that small cells produce noisy results.

    My own approach to assessing calibration in logistic models is usually to calculate observed and expected outcome probabilities in group defined by percentiles of risk (sometimes the classical deciles, sometimes more or fewer groups depending on the granularity of the data and the sample size) and look at O and E in each of those groups. I then explore them (often graphically) to try to learn where the model fits the data well and where it looks like it needs improvement. Often knowing where the model fits best and not so well enables you to make a good guess about what changes to the model would improve it. And, in any case, it leaves you in a good position to appraise how well and where the model fits the data. Sometimes when the fit looks poor in a small cell, looking at (O-E)^2/E for that cell as a 1df chi square statistic in that group is helpful in appraising to what extent that degree of poor fit is just consistent with the noisiness of a small cell's results. (And if the cell is so small that (O-E)^2/E isn't remotely like a chi square statistic, you can always calculate the likelihood of observing O as a binomial outcome with expected value E in a sample of that size.)

    Notice what I don't do. I don't bother with the total (O-E)^2/E chi square statistic, as I think the summary obliterates important details. I don't look at the p-value at all, because I think of goodness of fit as a quantitative thing, matter of degree and not uni-dimensional, and not a yes-no that can be "tested." (I think, by the way that goodness of fit "tests" are an outstanding example of the misuse of null hypothesis significance testing. The null hypothesis that the model is correct is almost always false a priori. The question is not whether the model is right, we know it isn't, but whether it is useful. Hat tip Box.)

    Comment


    • #3
      Dear Clyde,

      Thanks a lot for your reply - I really appreciate it.

      Kind regards,

      Jane

      Comment

      Working...
      X