Dear all
I am looking for some advice about the best way to graphically plot a non-linear continuous predictor in a discrete time proportional hazards model. I have found Stephen Jenkins' pages on DTPH very useful, but have got to a position where I require some guidance from the community.
I am modelling the time to an outcome on the age scale (years) and wish to see if two variables (moving house in a given year, binary) and total distance moved (continuous variable) predict the outcome, having controlled for a set of covariates.
I find that a model with a cubic term for moves gives best fit to the data as follows:
I assume I have set up the model correctly an specified the cubic term properly.
I have tried to use the margins command to estimate the marginal hazard but understand that this is not advisable in a proportional hazards scenario since the underlying hazards are not known. If I do so, however, I run the following
which produces the figure here.

While this gives me some visual clue as to the relationship, I am concerned it is wrong to do this, but do not know what to do instead. When 95%CIs are added to the plot, I am sure these are wrong, as I get the following:

I presume they should get broader over greater distances because a histogram shows the vast majority of participants have very small cumulative distances, with very few over 2000km. The dataset size is 1.4m participants
I would be grateful for any advice on a correct way to visually plot this non-linear relationship between distance and outcome, with correct 95% CIs, following discrete time proportional hazards modelling.
Thanks
James
I am looking for some advice about the best way to graphically plot a non-linear continuous predictor in a discrete time proportional hazards model. I have found Stephen Jenkins' pages on DTPH very useful, but have got to a position where I require some guidance from the community.
I am modelling the time to an outcome on the age scale (years) and wish to see if two variables (moving house in a given year, binary) and total distance moved (continuous variable) predict the outcome, having controlled for a set of covariates.
I find that a model with a cubic term for moves gives best fit to the data as follows:
Code:
cloglog outcome discrete_age i.moved_house c.distance c.distance#c.distance c.distance#c.distance#c.distance, eform
I have tried to use the margins command to estimate the marginal hazard but understand that this is not advisable in a proportional hazards scenario since the underlying hazards are not known. If I do so, however, I run the following
Code:
margins, at (distance=(min(range)max)) marginsplot, noci
While this gives me some visual clue as to the relationship, I am concerned it is wrong to do this, but do not know what to do instead. When 95%CIs are added to the plot, I am sure these are wrong, as I get the following:
I presume they should get broader over greater distances because a histogram shows the vast majority of participants have very small cumulative distances, with very few over 2000km. The dataset size is 1.4m participants
I would be grateful for any advice on a correct way to visually plot this non-linear relationship between distance and outcome, with correct 95% CIs, following discrete time proportional hazards modelling.
Thanks
James
Comment