Survival Analysis

John Payne

Join Date: Mar 2016

Posts: 12
#1

Survival Analysis

17 Mar 2016, 18:24

Hi everyone,

I am going to perform a survival analysis (cox proportional hazard model) for an outcome (Y) that can occur each month from the starting point (discrete).

I have several independent variables that do not change from this starting point. I also have two discrete time dependent variables that change for each month.

My question is related to how I should set up the data in Excel, particularly for the discrete time dependent variables. I seem to find a lot of information on how to analyze and run the data in Stata, but nothing on how to actually construct the dataset.

Could anyone please guide me to an example dataset that fits this description? Or any other guidance as to how the dataset should look in Excel.

I am kind of new to statistics (econometrics) and Stata, so please bear with me if this is a stupid question.

Kind regards,

John Payne
Tags: None
Paul Dickman

Join Date: Apr 2014

Posts: 294
#2

18 Mar 2016, 08:40

Have a look at this page for a start:
http://www.stata.com/support/faqs/st...ell-type-data/

Your data set will have multiple observations for each subject (person/company). Specifically, one observation for each month. Some of your variables (those that are time-constant) will be the same for every observation within a subject. You will then have two variables that will potentially change each month. The link above shows how to stset the data.

Assume the outcome can only occur once. A subject that is at risk for 7 months will then be represented by 7 observations in your data set. For the first 6 observations the outcome will be 0 (censored) whereas for the last observation it could be either an event or a censoring. The first observation will be for time 0 to 1, the second observation from time 1 to 2, and so on.

Note that I'm a biostatistician so may be using different terminology to that used in econometrics.

Last edited by Paul Dickman; 18 Mar 2016, 08:46.
Comment
John Payne

Join Date: Mar 2016

Posts: 12
#3

18 Mar 2016, 11:48

Thank you very much, Paul.

That was exactly the answer I was looking for.

Have a good day!
Comment
John Payne

Join Date: Mar 2016

Posts: 12
#4

26 Mar 2016, 11:56

Sorry, I was a bit unclear with one of my sentences in the opening post.

I wrote that I have several independent variables that do not change from the starting point of the study. What I meant to say is that these variables are only observed at the starting point of the study. They could change, but I don't collect any data for these variables after the starting point of the study.

Does this change the answer that was given above? Should I only input the data for these variables in the first row (first point in time) of each observation?

Any help is greatly appreciated.
Comment
John Payne

Join Date: Mar 2016

Posts: 12
#5

28 Mar 2016, 11:41

Anyone, please?
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

28 Mar 2016, 12:43

Maybe you didn't get any responses because Paul had already answered your question in post #2.

Your data set will have multiple observations for each subject (person/company). Specifically, one observation for each month. Some of your variables (those that are time-constant) will be the same for every observation within a subject."

You could also have found this out for yourself if you had looked at the examples for multiple record data in the Manual entry for stset (e.g. example 4, p. 360 V14 manual).

How should you get time-constant observations in every month of observation? That's not easy to say, because we don't know what data sources you have nor their format. I can safely say that it would be much easier, and less error-prone, in Stata. I advise you to read the data you have into Stata, then do all manipulations there.You say you have one observation per month. Are there no dates of events or of measurement occasions in the source records?

Last edited by Steve Samuels; 28 Mar 2016, 12:49.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

29 Mar 2016, 09:26

Below is one way of getting the constant variables into subsequent record lines.

If you describe the source data sets you have now (in Excel or otherwise), we'll be able to advise how to get them into Stata and how to carry out the steps needed create the analysis data. The combining steps would often include merge and append

Suppose, for example, you've imported data that has one line per month of observation with constant variables "sex" and age" recorded at time 1 ( first month). sex and age are missing for other months We use a local macro to process all these at one time. Note that commands and results are displayed inside a CODE block, prescribed by FAQ 12.

Code:

clear input /// id time sex age 1 1 1 50 1 2 . . 1 3 . . 2 1 2 30 2 2 . . end local vlist sex age // macro to hold variable names foreach v in `vlist'{ bys id (time): replace `v' = `v'[1] } list +-----------------------+ | id time sex age | |-----------------------| 1. | 1 1 1 50 | 2. | 1 2 1 50 | 3. | 1 3 1 50 | 4. | 2 1 2 30 | 5. | 2 2 2 30 | +-----------------------+

Similar techniques can be used to carry forward time-varying variables from the time they first appear up to the time just prior to the next change.

Last edited by Steve Samuels; 29 Mar 2016, 09:35.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment