Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Number of participants after Cox regression with listwise deletion of missing data.

    Dear Statalist,

    I am using multivariable Cox regression. My main exposure (TS_WHO_gold) is a categorical variable with three levels (normal, osteopenia, osteoporosis). Several of the covariates in the model have missing data, which are deleted by listwise deletion. My question is: How can I find the count of each exposure category after running the Cox regression? I tried merely -tab- as shown below (2,686), but it doesn't match the number of observations (2,346) in the output from -stcox- (see below).

    Code:
    . tab TS_WHO_gold if BMIcat_gold!=. & SmoStatPackYrs_gold_missing!=. & SES_gold!=. & COP
    > Dcat_gold!=. & physact_gold!=. & alcohol_gold!=. & PartAg_gold!=. & CVD_gold!=. & canc
    > er_gold!=. & chrondisADL_gold!=. & diabetes_gold!=. & musc_skel_gold!=.
    
         WHO BMD |
      categories |
    for the HUNT |
     COPD cohort |
           using |
     forearm and |
       total hip |
        DXA meas |      Freq.     Percent        Cum.
    -------------+-----------------------------------
          normal |      1,737       64.67       64.67
      osteopenia |        565       21.03       85.70
    osteoporosis |        384       14.30      100.00
    -------------+-----------------------------------
           Total |      2,686      100.00

    Code:
    . stcox i.TS_WHO_gold i.Sex ib2.BMIcat_gold i.SmoStatPackYrs_gold_missing i.SES_gold i.C
    > OPDcat_gold i.physact_gold i.alcohol_gold c.PartAg_gold i.CVD_gold i.cancer_gold i.chr
    > ondisADL_gold i.diabetes_gold i.musc_skel_gold if (goldcopd_HUNT==1 | goldcopd_HUNT==2
    > ) & (TS_HUNT!=.) & (PartAg_gold>=40.0 & PartAg_gold<=85.0)
    
             failure _d:  RegisStat == 5
       analysis time _t:  (enddate-origin)/365.25
                 origin:  time PartDat_gold
                     id:  PID_107945
    
    Iteration 0:   log likelihood = -8426.7323
    Iteration 1:   log likelihood = -7838.0546
    Iteration 2:   log likelihood =  -7797.353
    Iteration 3:   log likelihood = -7797.1758
    Iteration 4:   log likelihood = -7797.1757
    Refining estimates:
    Iteration 0:   log likelihood = -7797.1757
    
    Cox regression -- Breslow method for ties
    
    No. of subjects =        2,346                  Number of obs    =       2,346
    No. of failures =        1,187
    Time at risk    =  27880.69268
                                                    LR chi2(29)      =     1259.11
    Log likelihood  =   -7797.1757                  Prob > chi2      =      0.0000

    Best regards,
    Sigrid

  • #2
    After running the Cox regression you can do this:
    Code:
    tab TS_WHO_gold if e(sample)

    Comment


    • #3
      Great, thank you!

      Comment


      • #4
        In continuation to the above question, how can I see the number of events (failures) by tertile of the exposure after running the Cox model, as the covariates have some missing data? Much appreciated!!

        Comment


        • #5
          Code:
          xtile tercile = exposure_variable if e(sample), nq(3)
          tabstat (sum) _d if e(sample), by(tercile)
          That said, where are you going with this? The number of failures in each exposure tertile strikes me as a statistic that is probably meaningless and readily subject to misleading interpretations. If you say what your trying to figure out about your analysis, someone may be able to offer a better way to go about it.

          Comment


          • #6
            Thank you. I am using some blood biomarkers (e.g., LDL-c) as exposure, and I created tertiles to examine if there is a specific category that impacts more than others (actually to see if there is a non-linearity) on the outcome (a disease).
            The command that I found suitable regarding my above question is:

            Code:
            tab exposure_var_tertile _d if e(sample)

            Comment


            • #7
              My concern in #5 was not about the tertiles of the exposure variable but about getting counts of failure events. How are you going to use those results? The number of failures in a group is not, by itself, useful information. It is only interpretable in the context of both the time at risk associated with those failures and the number of censored observations as well. Comparing number of failures in the tertiles without some appropriate accounting for these other phenomena can lead to very misleading conclusions. Proceed with extreme caution.

              Comment

              Working...
              X