Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Effect of variables six months before dying - Cox Regression

    I don't know if I'm allowed to replicate my question here, but in this forum I usually always get a good answer!
    Originally posted here: http://stats.stackexchange.com/quest...cox-regression

    I have a sample of 3000 patients and, for each of them, I know the number of years they lived. I was able to monitor these patients over 10 months (at a random time of their life). So I have left and right censoring in my data.

    We were able to measure a specific variable named X (is continuous). We noticed that when the values of X is very high then is more probable that the patient will die exactly in 6 months (X has not a constant trend afterwards, so 5 to 1 months before dying it can increase or decrease with no meaning).

    I'm using Cox Regression to assess the influence of variable X over the probability to die. As a base for the regression I use the age of patients. I only want to focus on the value of X in the exact month 6 before dying, for patients who died and compare it with the average values of patients who didn't die. This to predict patients who are more probable to die.

    To resume my variable X is equal to: - the average over the 10 monitoring months, for patients who did not die; - the value measured at month 6 before dying, for patients who died. This variable I put into Cox Regression.

    I would really appreciate your help to know if there's any error in my procedure.
    Thanks a lot!
    Andrea




  • #2
    The proposed analysis is incorrect: the definition of X for non-deaths as the average is based on information available only after the X's were observed- whether a person died. The proper approach is to define for every person a time-varying covariate which is the six-month-lagged value of X.

    However- what you are describing is a classic case of data dredging: after looking at many patterns, you selected the single big effect. Naturally the p-value will be "significant" but the size is worthless after such a procedure. I can't think of a multiplicity adjustment like Bonferroni that would work here, because there are so many patterns that might have caught your eye.

    One honest statistical assessment, short of a new study, is to simulate data under the assumption that there is no or little effect of X. Then compute the proportion of time that a maximum at any lag was as large as the one you observed. Even this doesn't cover the other possible interesting patterns.

    The pattern you saw also appears scientifically implausible to me- it's hard to think of a lagged effect that would not also appear, to a lesser degree, in neighboring months.
    Last edited by Steve Samuels; 29 Oct 2015, 09:35.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Thanks a lot Steve!
      What If I still use Cox regression considering for every patient the value of X in the 6 months before the end of the observation period or the 6 months before dying, if they die? Would it be correct in that way?

      If I use a longitudinal approach as you say (like multilevel logit or simple logit) I find that X is significant but the pseudo R-squared is very very low.

      I'm not sure about data dredging since what we find makes perfect sense to us. There was a previous hypothesis that in this way would be verifyed. However, I'm looking for the best statistical method that gives us results. And of course to be correct

      Oh yes and by the way the effect is appearing - reduced - also in months 5 before dying, but I was not telling that, to simplify my question.

      Thanks again!
      Andrea.


      Last edited by Andrea Arancio; 29 Oct 2015, 10:36.

      Comment


      • #4
        What If I still use Cox regression considering for every patient the value of X in the 6 months before the end of the observation period or the 6 months before dying, if they die? Would it be correct in that way?
        That's not the way Cox works. It compares the value of X for each person who dies at a time t to the (weighted) average) of X for all other people who are "at risk" at t, i.e. people in the same "risk set." Some of these other risk set members may later die, but that is irrelevant. If you use the average of all non-deaths in stcox, you no longer have a causal model (one in which cause precedes effect), because you don't know who who will not die until the end of the study.

        So create the time-varying covariates for all the other lags of interest.

        You mention a "longitudinal" logit model. The only way that a simple logistic mode can be considered longitudinal fit it measured status at fixed time after the start of observation (e.g.12 months). In that case, covariates must be known at baseline for everyone, which is obviously not true in your case.

        Thanks for the clarification about the rationale for your analysis: you said originally only that you "noticed" this phenomenon and that at 5 months it could "increase or decrease with no meaning".
        Last edited by Steve Samuels; 29 Oct 2015, 11:40.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Thanks again and sorry if Statistics is obviously not my field!
          Would you be so kind to more exstensively explain what you mean with
          So create the time-varying covariates for all the other lags of interest?
          You also say:
          The proper approach is to define for every person a time-varying covariate which is the six-month-lagged value of X.
          In which way would you use these time-varying variables in the Cox Regression?

          Let's say I create a 6 months lagged variable for X (that should be the value of X six months ago, for this month). So for people who did not die I have 10 measures of X (one per each observtion month). Accordingly I could obtain 4 values of X (the first one would be at observation month 7 where I would use the value of X at observation month 1). Is that right?
          My doubt is how to use in the Cox regression these 4 observations per each patient?

          One way could be splitting the survival time into episodes so X has value X1 from age 0 to 5 (firs observation month) and then value X2 from age 5 to age 5 and one month.
          However this last idea looks wrong because I cannot assume X was constant from birth to age of first observation. Also patients were observed starting at different ages of their life (during the 10 months observation period some were 5 years old, some were 8 and so on..).

          WDYT? Thanks for your patience.


          Comment


          • #6
            Belated answer to opening question in #1: cross-posting policy is explained in http://www.statalist.org/forums/help#crossposting and is that you should tell us about it. So that's fine.

            Comment


            • #7
              Some questions:
              Do you have exact dates of death? Were dates recorded so that you can get days of followup?
              Last edited by Steve Samuels; 29 Oct 2015, 15:54.
              Steve Samuels
              Statistical Consulting
              [email protected]

              Stata 14.2

              Comment


              • #8
                I have the month when they die and I know their age when they die (approximation to months), but maybe I could also get the exact days. Would this help?

                Comment


                • #9
                  With only 10 possible "times" of death (or nine if month is the month of enrollment), it's possible that there are too many tied times of death to use a Cox analysis. Exact dates of start of follow-up and of death or withdrawal would break the ties, so it would be worth the effort to get them.

                  So we'd need to know: how many deaths and the maximum number in any month of followup.


                  Dates of the monitoring visits would also be helpful in identifying which visit corresponds to, e.g. the 6 month lag at the date of a death. This could be defined as the visit nearest to 180 days prior to the that date ( 180 = 6 x 30 days). Otherwise there would be some ambiguity in the actual length of the lag for a visit "six months" ago.
                  Steve Samuels
                  Statistical Consulting
                  [email protected]

                  Stata 14.2

                  Comment


                  • #10
                    Thanks a lot Steve. I undestrand this and will use it.
                    Still is not clear to me how to use time-varying X, as in my #5. Any advice on that?

                    Comment


                    • #11
                      I'm sorry for the delay in responding, Andrea. A laptop malfunctioned while I was traveling.

                      A time-varying covariate is one that changes with time. See the Manual entry for stset, Example 3, p. 359 for an example inputting such data. You might have dates instead of "times" and would do well to add monitoring visit number. The manual entry for stcox has examples of

                      You need to learn concepts of survival analysis, in particular the hazard function. The book Survival Analysis with Stata by Cleves (Stata Press)is good. I also like the draft of "Survival Analysis" by Stephen Jenkins, which is linked to on his website:

                      With dates you will use the "continuous time" sections. The hazard rate is defined in Chapter 2. The proportional hazards model is defined in Section 3.2.8. The Cox model is presented in Chapter 7. Also on Stephen's web page are lessons for analyzing survival data. Lesson 3 "Preparing survival data.." Section 6.3 talks about setting up data where covariates have multiple values over time. The Stata manual entry for stcox contains examples and methods of assessing the proportional hazards assumption.

                      The hazard function: The thing to know about the hazard function is that it is a function h(t|z) which, roughly, is the instantaneous rate of failures at t for people with covariates value z at t. These might not tbe covariates measured at t, but some could be, for example:the value of a measurement made. 1, 2, 3, or k months prior to t

                      stcox needs to know the values at each failure time t for each individual still at risk of failing at t. These are members of the risk setat t.; deaths and dropouts prior to t are not present. Your data has only values for covariates at monitoring visit, but stsplit, at(failures) will create a new data record for each individual at all failure times for which the individual is at risk. The record will contain the most recent valuesof each time-varying covariate.

                      It is easy to get the lagged values of your variable x at prior monitoring visits. Then you can use Stata's time series operators to easily get lagged values, even if some people miss a visit. For example:

                      Code:
                      tsset id visit
                      
                      gen date_1 = L.date  date of last visit
                      gen date_2 = L2.date
                      
                      gen x_1 = L.x   values of x at last visit
                      gen x_2 = L2.x  value of x two visits ago.
                      However I you might need to reconsider your goal of analyzing six-month lags. As you may have noticed, these are defined only for individuals who are at visits 7, 8, 9, and 10. If you want to compare compare effects of shorter lags to those of six month lags, you will have to confine the analysis to those at risk at 7 months or later.

                      Steve Samuels
                      Statistical Consulting
                      [email protected]

                      Stata 14.2

                      Comment


                      • #12
                        Thanks a lot Steve!

                        Your answer was precious and I'm studying on the materials you shared. Everything appears a bit more clear now.

                        Comment

                        Working...
                        X