Dear all,
I worked with stata for some years, however for a new research project I want to implement a survival model (e.g. cox hazard model). So far I struggle with setting up stata in a way that it understands the composition of my dataset.
My data looks like this:
I have different companies for a specific time range in years (2001 - 2015). The companies emerge at a certain point in time and can disappear from my sample because of a specified failure. This failure is very specific and is not the only cause a company can disappear from my data. Also I have two different sub groups in my sample (two types of firms). The aim is to estimate the hazard function (and plot it as a graph) for both sub groups. Please find here a visualization of my data:
What I tried is to setup stata like this:
I try to graph the hazard function with
and it looks like this:

My questions are:
1. Is the setup of my data is correct in that way? If not, what would be the correct way for a setup?
2. Why does the graph looks so strange ? Is there any way to change the x and y axis? (I guess this might have to do with a wrong setup of my data...)
Many thanks for your answer in advance,
John
I worked with stata for some years, however for a new research project I want to implement a survival model (e.g. cox hazard model). So far I struggle with setting up stata in a way that it understands the composition of my dataset.
My data looks like this:
I have different companies for a specific time range in years (2001 - 2015). The companies emerge at a certain point in time and can disappear from my sample because of a specified failure. This failure is very specific and is not the only cause a company can disappear from my data. Also I have two different sub groups in my sample (two types of firms). The aim is to estimate the hazard function (and plot it as a graph) for both sub groups. Please find here a visualization of my data:
ID | year | failure | subgroup_1 |
1 | 2001 | 0 | 1 |
1 | 2002 | 1 | 1 |
2 | 2004 | 0 | 0 |
2 | 2005 | 0 | 0 |
3 | 2001 | 0 | 1 |
Code:
stset year, id(ID) failure( failure==1) id: ID failure event: failure== 1 obs. time interval: (year[_n-1], year] exit on or before: failure 4885 total observations 4 observations begin on or after (first) failure 4881 observations remaining, representing 536 subjects 109 failures in single-failure-per-subject data --> This is the correct number of failures ! 1078358 total analysis time at risk and under observation at risk from t = 0 earliest observed entry t = 0 last observed exit t = 2015
Code:
sts graph, by(subgroup_1)
My questions are:
1. Is the setup of my data is correct in that way? If not, what would be the correct way for a setup?
2. Why does the graph looks so strange ? Is there any way to change the x and y axis? (I guess this might have to do with a wrong setup of my data...)
Many thanks for your answer in advance,
John
Comment