Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cox regression - choice of independent variable

    I have 4 treatments and the count of 3 possible complications with no censored data, and I have used Cox regression to investigate the effect of the complications on the treatment time.
    Code:
    . stset et
    
    Survival-time data settings
    
             Failure event: (assumed to fail at time=et)
    Observed time interval: (0, et]
         Exit on or before: failure
    
    --------------------------------------------------------------------------
             96  total observations
              0  exclusions
    --------------------------------------------------------------------------
             96  observations remaining, representing
             96  failures in single-record/single-failure data
         46,172  total analysis time at risk and under observation
                                                    At risk from t =         0
                                         Earliest observed entry t =         0
                                              Last observed exit t =     1,060
    
    . stcox i.trt c.c1 c.c2 c.c3
    
            Failure _d: 1 (meaning all fail)
      Analysis time _t: et
    
    Iteration 0:  Log likelihood = -345.60672
    Iteration 1:  Log likelihood = -262.44595
    Iteration 2:  Log likelihood = -251.05725
    Iteration 3:  Log likelihood = -250.60846
    Iteration 4:  Log likelihood = -250.60707
    Refining estimates:
    Iteration 0:  Log likelihood = -250.60707
    
    Cox regression with Breslow method for ties
    
    No. of subjects =     96                                Number of obs =     96
    No. of failures =     96
    Time at risk    = 46,172
                                                            LR chi2(6)    = 190.00
    Log likelihood = -250.60707                             Prob > chi2   = 0.0000
    
    ------------------------------------------------------------------------------
              _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             trt |
              1  |      1.000  (base)
              2  |      0.042      0.017  -7.7458   0.000        0.019       0.093
              3  |      0.004      0.002  -9.6904   0.000        0.001       0.013
              4  |      0.000      0.000 -10.3820   0.000        0.000       0.002
                 |
              c1 |      0.460      0.037  -9.5976   0.000        0.393       0.539
              c2 |      0.879      0.061  -1.8468   0.065        0.766       1.008
              c3 |      1.077      0.073   1.0923   0.275        0.943       1.231
    ------------------------------------------------------------------------------
    My concern is that the complications are counts, and c2 and c3 have a large number of 0s. Do I need to allow for this in the regression, if so how, or do I just include them as I have done.

    Thank you,
    Julie
    Code:
    [
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(trt c1 c2 c3) double et
    1  5 0 1  185
    1  9 1 0  285
    1 15 6 1  487
    1 15 7 2  453
    1 12 5 6  395
    1 14 4 0  483
    1 11 5 0  356
    1  5 0 0  168
    1  9 4 0  312
    1  8 1 0  283
    1 11 0 0  343
    1  9 3 1  303
    1 15 5 2  483
    1  5 0 1  182
    1  5 0 0  168
    1 15 3 0  640
    1 11 2 1  334
    1 14 1 0  437
    1 10 3 1  364
    1  8 3 0  262
    1 11 1 4  343
    1  9 2 0  325
    1  9 1 1  371
    2 14 0 4  465
    2  8 2 3  439
    2 11 6 0  608
    2  7 0 1  335
    2  7 1 0  355
    2  9 1 0  412
    2  7 1 2  316
    2 10 3 1  455
    2  9 0 0  435
    2  7 1 0  438
    2 12 5 3  592
    2  9 1 2  437
    2  4 1 0  208
    2  6 0 2  291
    2 11 1 1  578
    2  8 2 3  393
    2 12 4 0  565
    2 10 5 1  503
    2  3 0 0  175
    2 10 7 2  574
    2 11 2 1  524
    2 14 0 0  636
    2 11 2 0  545
    2  7 2 1  336
    2  8 0 0  391
    3  4 1 0  353
    3  8 5 0  543
    3  5 1 1  343
    3  9 0 3  593
    3  5 0 2  381
    3  7 2 7  484
    3  5 0 2  477
    3  5 1 1  393
    3  9 4 1  559
    3  7 0 3  479
    3  6 4 1  431
    3  6 1 4  411
    3 11 2 0  761
    3  6 0 1  398
    3  8 0 1  492
    3  4 0 3  298
    3 10 4 2  693
    3  7 6 1  536
    3 14 4 0  923
    3  8 0 1  529
    3  6 1 2  395
    3  6 0 1  397
    3  5 0 1  364
    4  3 0 1  290
    4 12 3 4  954
    4  5 0 2  473
    4  7 1 5  586
    4 11 4 8  951
    4 11 2 0 1060
    4  5 1 1  776
    4  8 2 2  693
    4 13 3 7 1027
    4  4 0 2  375
    4  6 1 1  416
    4  3 1 1  302
    4 10 5 0  896
    4  6 0 3  498
    4  7 1 1  580
    4 10 2 2  877
    4  7 0 3  573
    4  5 0 0  424
    4  7 0 2  566
    4  6 1 2  511
    4 11 3 4  873
    4  6 1 1  513
    4  6 2 1  540
    4  6 2 1  549
    4  8 3 1  668
    end
    ------------------ copy up to and including the previous line ------------------



  • #2
    I don't think the large number of zeroes in the c* variables is a problem, as long as there is more than just a handful of non-zero values in there somewhere.

    Where you might have room for improving the model is whether it is reasonable (it may or may not be, I'm not pre-judging the issue here) to assume that the relationship between the log hazard ratio and the count of complications is linear. You should run a proportional hazards (PH) test if you haven't already done so. If it doesn't reject the PH assumption on those variables, then you are probably OK with your model. If it does, you should look into non-linear transformations or interactions to improve the model.

    Comment


    • #3
      Thank you.
      Code:
      . estat phtest
      
      Test of proportional-hazards assumption
      
      Time function: Analysis time
      ------------------------------------------------
                   |     chi2       df       Prob>chi2
      -------------+----------------------------------
       Global test |    12.02        6          0.0615
      ------------------------------------------------
      On this basis I will continue with the analysis. I do not think there is sufficient data to incorporate interactions.

      Julie

      Comment

      Working...
      X