Dear Stata community
I have a bit more of an econometric question related to Accelerated Failure Time models. I have data on the age at first criminal offense (as an example) and I would like to get an idea of the trend of this time across birth cohorts. My data is likely right censored because not everyone in the dataset who would commit their first offense has done so yet. For example, someone who is 18 has not yet committed their first offense, but perhaps they would do so at 19. Not accounting for this will lead to an underestimate in the mean age at first offense. The data looks as follows:
I have decided to censor observations if they had not yet committed an offense by the time of interview.
I want to account for this right censoring using survival analysis. I cannot use a Cox PH model because the assumption of proportional hazard across birth cohort fails. So, I am interested in the Accelerated Failure Time model. This model requires an assumption on the structure of the first_offense data. I have checked using the AIC and BIC and the data is best fitted by a generalized gamma distribution. So, now I would conduct my regression as follows:
stset first_offense, failure(censor)
streg birth_cohort_* age, distribution(ggamma) tratio
I have broken up the birth_cohort categorical variable into dummy variables and I would like to control for the observation's age as well.
This is my first time working with survival analysis, so I would just like to check whether this type of model makes sense, and whether there is anything else that I should maybe keep in mind? Are there any specific checks (post-estimation or otherwise) that I could do to see how well this model fits? Like I said, I only want to assess the trend in age at first offense over birth cohorts.
Thank you so much for your kind assistance!
Regards,
Christiaan
I have a bit more of an econometric question related to Accelerated Failure Time models. I have data on the age at first criminal offense (as an example) and I would like to get an idea of the trend of this time across birth cohorts. My data is likely right censored because not everyone in the dataset who would commit their first offense has done so yet. For example, someone who is 18 has not yet committed their first offense, but perhaps they would do so at 19. Not accounting for this will lead to an underestimate in the mean age at first offense. The data looks as follows:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str9 birth_cohort float(age_offense censor) byte age "1976-1980" 29 1 35 "1966-1970" 17 1 47 "1991-1995" . 0 21 "1991-1995" . 0 23 "1996-2000" 16 1 18 "1986-1990" . 0 26 "1986-1990" 16 1 25 "1996-2000" . 0 16 "1991-1995" 14 1 24 "1976-1980" 33 1 37 "1996-2000" . 0 16 "1991-1995" . 0 19 "1996-2000" . 0 18 "1981-1985" 15 1 30 "1971-1975" 12 1 43 "1981-1985" 12 1 30 "1996-2000" . 0 18 "1991-1995" 15 1 23 "1996-2000" . 0 18 "1991-1995" . 0 23 "1991-1995" 20 1 23 "1996-2000" . 0 17 "1996-2000" 15 1 17 "1996-2000" . 0 16 "1966-1970" 33 1 47 "1981-1985" 26 1 30 "1981-1985" . 0 30 "1981-1985" 20 1 33 "1996-2000" . 0 19 "1996-2000" . 0 17 "1996-2000" . 0 15 "1986-1990" . 0 26 "1986-1990" 28 1 29 "1976-1980" 21 1 39 "1976-1980" 19 1 35 "1996-2000" . 0 18 "1991-1995" 17 1 22 "1996-2000" 14 1 16 "1996-2000" . 0 19 "1996-2000" . 0 16 "1991-1995" . 0 22 "1981-1985" . 0 32 "1991-1995" . 0 20 "1996-2000" . 0 16 "1976-1980" 21 1 38 "1976-1980" 23 1 38 "1981-1985" . 0 31 "1991-1995" 17 1 23 "1981-1985" 23 1 30 "1971-1975" 20 1 42 "1976-1980" 27 1 36 "1986-1990" 18 1 28 "1986-1990" 21 1 29 "1991-1995" . 0 23 "1996-2000" 12 1 18 "1986-1990" 12 1 28 "1986-1990" 17 1 28 "1986-1990" . 0 28 "1981-1985" 27 1 30 "1986-1990" 24 1 27 "1971-1975" 11 1 40 "1991-1995" . 0 20 "1996-2000" . 0 17 "1991-1995" 16 1 23 "1991-1995" 19 1 24 "1976-1980" 20 1 36 "1996-2000" . 0 18 "1971-1975" 23 1 42 "1981-1985" . 0 30 "1996-2000" . 0 19 "1991-1995" 20 1 24 "1996-2000" . 0 19 "1991-1995" 14 1 21 "1981-1985" 24 1 32 "1991-1995" 22 1 23 "1981-1985" 23 1 32 "1996-2000" . 0 17 "1991-1995" 18 1 21 "1981-1985" 23 1 31 "1996-2000" . 0 16 "1971-1975" 23 1 41 "1986-1990" 20 1 29 "1996-2000" . 0 18 "1976-1980" 18 1 38 "1981-1985" 22 1 33 "1981-1985" 14 1 30 "1996-2000" . 0 16 "1991-1995" 17 1 23 "1976-1980" 21 1 37 "1966-1970" 21 1 47 "1991-1995" . 0 24 "1981-1985" 18 1 34 "1991-1995" . 0 23 "1991-1995" 15 1 20 "1976-1980" 20 1 35 "1976-1980" 16 1 38 "1996-2000" . 0 16 "1981-1985" 25 1 31 "1986-1990" 22 1 28 "1986-1990" 23 1 28 end
I want to account for this right censoring using survival analysis. I cannot use a Cox PH model because the assumption of proportional hazard across birth cohort fails. So, I am interested in the Accelerated Failure Time model. This model requires an assumption on the structure of the first_offense data. I have checked using the AIC and BIC and the data is best fitted by a generalized gamma distribution. So, now I would conduct my regression as follows:
stset first_offense, failure(censor)
streg birth_cohort_* age, distribution(ggamma) tratio
I have broken up the birth_cohort categorical variable into dummy variables and I would like to control for the observation's age as well.
This is my first time working with survival analysis, so I would just like to check whether this type of model makes sense, and whether there is anything else that I should maybe keep in mind? Are there any specific checks (post-estimation or otherwise) that I could do to see how well this model fits? Like I said, I only want to assess the trend in age at first offense over birth cohorts.
Thank you so much for your kind assistance!
Regards,
Christiaan