Cox regression - choice of independent variable

Julie Xiu

Join Date: May 2023
Posts: 14

Cox regression - choice of independent variable

20 Feb 2025, 07:57

I have 4 treatments and the count of 3 possible complications with no censored data, and I have used Cox regression to investigate the effect of the complications on the treatment time.

Code:

. stset et

Survival-time data settings

         Failure event: (assumed to fail at time=et)
Observed time interval: (0, et]
     Exit on or before: failure

--------------------------------------------------------------------------
         96  total observations
          0  exclusions
--------------------------------------------------------------------------
         96  observations remaining, representing
         96  failures in single-record/single-failure data
     46,172  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =     1,060

. stcox i.trt c.c1 c.c2 c.c3

        Failure _d: 1 (meaning all fail)
  Analysis time _t: et

Iteration 0:  Log likelihood = -345.60672
Iteration 1:  Log likelihood = -262.44595
Iteration 2:  Log likelihood = -251.05725
Iteration 3:  Log likelihood = -250.60846
Iteration 4:  Log likelihood = -250.60707
Refining estimates:
Iteration 0:  Log likelihood = -250.60707

Cox regression with Breslow method for ties

No. of subjects =     96                                Number of obs =     96
No. of failures =     96
Time at risk    = 46,172
                                                        LR chi2(6)    = 190.00
Log likelihood = -250.60707                             Prob > chi2   = 0.0000

------------------------------------------------------------------------------
          _t | Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         trt |
          1  |      1.000  (base)
          2  |      0.042      0.017  -7.7458   0.000        0.019       0.093
          3  |      0.004      0.002  -9.6904   0.000        0.001       0.013
          4  |      0.000      0.000 -10.3820   0.000        0.000       0.002
             |
          c1 |      0.460      0.037  -9.5976   0.000        0.393       0.539
          c2 |      0.879      0.061  -1.8468   0.065        0.766       1.008
          c3 |      1.077      0.073   1.0923   0.275        0.943       1.231
------------------------------------------------------------------------------

My concern is that the complications are counts, and c2 and c3 have a large number of 0s. Do I need to allow for this in the regression, if so how, or do I just include them as I have done.

Thank you,
Julie

Code:

[
* Example generated by -dataex-. For more info, type help dataex
clear
input byte(trt c1 c2 c3) double et
1  5 0 1  185
1  9 1 0  285
1 15 6 1  487
1 15 7 2  453
1 12 5 6  395
1 14 4 0  483
1 11 5 0  356
1  5 0 0  168
1  9 4 0  312
1  8 1 0  283
1 11 0 0  343
1  9 3 1  303
1 15 5 2  483
1  5 0 1  182
1  5 0 0  168
1 15 3 0  640
1 11 2 1  334
1 14 1 0  437
1 10 3 1  364
1  8 3 0  262
1 11 1 4  343
1  9 2 0  325
1  9 1 1  371
2 14 0 4  465
2  8 2 3  439
2 11 6 0  608
2  7 0 1  335
2  7 1 0  355
2  9 1 0  412
2  7 1 2  316
2 10 3 1  455
2  9 0 0  435
2  7 1 0  438
2 12 5 3  592
2  9 1 2  437
2  4 1 0  208
2  6 0 2  291
2 11 1 1  578
2  8 2 3  393
2 12 4 0  565
2 10 5 1  503
2  3 0 0  175
2 10 7 2  574
2 11 2 1  524
2 14 0 0  636
2 11 2 0  545
2  7 2 1  336
2  8 0 0  391
3  4 1 0  353
3  8 5 0  543
3  5 1 1  343
3  9 0 3  593
3  5 0 2  381
3  7 2 7  484
3  5 0 2  477
3  5 1 1  393
3  9 4 1  559
3  7 0 3  479
3  6 4 1  431
3  6 1 4  411
3 11 2 0  761
3  6 0 1  398
3  8 0 1  492
3  4 0 3  298
3 10 4 2  693
3  7 6 1  536
3 14 4 0  923
3  8 0 1  529
3  6 1 2  395
3  6 0 1  397
3  5 0 1  364
4  3 0 1  290
4 12 3 4  954
4  5 0 2  473
4  7 1 5  586
4 11 4 8  951
4 11 2 0 1060
4  5 1 1  776
4  8 2 2  693
4 13 3 7 1027
4  4 0 2  375
4  6 1 1  416
4  3 1 1  302
4 10 5 0  896
4  6 0 3  498
4  7 1 1  580
4 10 2 2  877
4  7 0 3  573
4  5 0 0  424
4  7 0 2  566
4  6 1 2  511
4 11 3 4  873
4  6 1 1  513
4  6 2 1  540
4  6 2 1  549
4  8 3 1  668
end

------------------ copy up to and including the previous line ------------------

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 29818
#2

20 Feb 2025, 09:34

I don't think the large number of zeroes in the c* variables is a problem, as long as there is more than just a handful of non-zero values in there somewhere.

Where you might have room for improving the model is whether it is reasonable (it may or may not be, I'm not pre-judging the issue here) to assume that the relationship between the log hazard ratio and the count of complications is linear. You should run a proportional hazards (PH) test if you haven't already done so. If it doesn't reject the PH assumption on those variables, then you are probably OK with your model. If it does, you should look into non-linear transformations or interactions to improve the model.
Comment

Julie Xiu

Join Date: May 2023
Posts: 14

20 Feb 2025, 10:06

Thank you.

Code:

. estat phtest

Test of proportional-hazards assumption

Time function: Analysis time
------------------------------------------------
             |     chi2       df       Prob>chi2
-------------+----------------------------------
 Global test |    12.02        6          0.0615
------------------------------------------------

On this basis I will continue with the analysis. I do not think there is sufficient data to incorporate interactions.

Julie

Announcement

Cox regression - choice of independent variable

Comment

Comment