Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference in N of obs. in Cox model

    Hi,

    I have a problem regarding the difference in the number of observations between the Cox model and the frequency analysis by year.

    Here, I have a data set of the 2009 to 2019 wave (year).

    Year Freq. Percent Cum.
    2009 104 11.30 11.30
    2010 95 10.33 21.63
    2011 95 10.33 31.96
    2012 91 9.89 41.85
    2013 86 9.35 51.20
    2014 81 8.80 60.00
    2015 77 8.37 68.37
    2016 76 8.26 76.63
    2017 74 8.04 84.67
    2018 72 7.83 92.50
    2019 69 7.50 100.00
    Total 920 100.00
    As you can see from above, the maximum number of observations was 104 in 2009.

    However, when I run the Cox model the number of observations is bigger.
    Can somebody explain why this could happen?

    The dependant variable is the exit of medical benefit (coded as medical_final)
    medical_final
    Freq. Percent Cum.
    0 675 73.37 73.37
    1 245 26.63 100.00
    Total 920 100.00
    . stset tf, failure(medical_final=1)
    failure event: medical_final == 1
    obs. time interval: (0, tf]
    exit on or before: failure
    920 total observations
    0 exclusions
    920 observations remaining, representing
    245 failures in single-record/single-failure data
    5,151 total analysis time at risk and under observation
    at risk from t = 0
    earliest observed entry t = 0
    last observed exit t = 11
    .
    . stcox i.sex age i.married householdmembers yearsofshooling i.chronic_disease i.have_disability alcoholqualtity depression i.jobtype i.jobstable i.fulltimejob totalcostofliving, nohr

    failure _d: medical_final == 1
    analysis time _t: tf

    Iteration 0: log likelihood = -331.49314
    Iteration 1: log likelihood = -314.58647
    Iteration 2: log likelihood = -312.79623
    Iteration 3: log likelihood = -312.78731
    Iteration 4: log likelihood = -312.78731
    Refining estimates:
    Iteration 0: log likelihood = -312.78731



    Cox regression -- Breslow method for ties

    No. of subjects = 296 Number of obs = 296
    No. of failures = 77
    Time at risk = 1473
    LR chi2(17) = 37.41
    Log likelihood = -312.78731 Prob > chi2 = 0.0030
    _t Coef. Std. Err. z P>z [95% Conf. Interval]
    sex
    women -.5941747 .4756698 -1.25 0.212 -1.52647 .3381209
    age -.0787584 .0431815 -1.82 0.068 -.1633926 .0058758
    1.married -.3554892 .5308085 -0.67 0.503 -1.395855 .6848763
    householdmembers .1617833 .1848749 0.88 0.382 -.2005648 .5241314
    yearsofshooling .0038488 .0603173 0.06 0.949 -.114371 .1220686
    chronic_disease
    1 -.2000307 1.044555 -0.19 0.848 -2.247322 1.84726
    2 -.6337595 1.04773 -0.60 0.545 -2.687273 1.419754
    3 -.3988995 .2983077 -1.34 0.181 -.9835718 .1857728
    1.have_disability .2235872 .3358214 0.67 0.506 -.4346106 .8817849
    alcoholqualtity .0866133 .0932219 0.93 0.353 -.0960983 .2693248
    depression -.1622675 .2658865 -0.61 0.542 -.6833954 .3588604
    jobtype
    2 -1.194072 .5009966 -2.38 0.017 -2.176007 -.2121367
    3 -1.248776 .5771651 -2.16 0.030 -2.379999 -.1175533
    4 -2.392911 .667759 -3.58 0.000 -3.701694 -1.084127
    1.jobstable -.5512276 .3898746 -1.41 0.157 -1.315368 .2129126
    1.fulltimejob .1312842 .2975419 0.44 0.659 -.4518872 .7144557
    totalcostofliving -.0011913 .0012785 -0.93 0.351 -.0036972 .0013145
    Last edited by Hayoung Choi; 12 Jan 2022, 02:05.

  • #2
    Hayoung:
    your started from 920 observations.
    However -stcox- No. of subjects = 296 includes those who fail (77) and those who don't (296-77).
    It seems to be one observation per subject.
    The difference between 920 and 296 might be due to missing values in any of the covariates that Stata manages vis listwise deletion (ie, ruling out from -stcox- the corresponding observations).
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Hayoung:
      your started from 920 observations.
      However -stcox- No. of subjects = 296 includes those who fail (77) and those who don't (296-77).
      It seems to be one observation per subject.
      The difference between 920 and 296 might be due to missing values in any of the covariates that Stata manages vis listwise deletion (ie, ruling out from -stcox- the corresponding observations).
      Oh, I see. Thank you for your kind response!

      Comment

      Working...
      X