Hello,
I am currently performing a survival analysis project for melanoma (a form of skin cancer). I am reasonably new to Stata having only started using in past 4 months.
I have been using a Cox proportional hazard model thus far in my analyses.
Within the dataset of approximately 3,600 observations there are up to 20% missing variables.
I have explored exclusion and other missing variable methods however too many of my failures would be lost for my analysis (currently total 400 failures which are melanoma specific deaths)
I have ended up choosing the utility of multiple imputation using chained equations (MICE) given that some of the key prognostic variables are not normally distributed and heavily skewed.
To begin with I have selected key prognostic values recorded within the dataset for melanoma being Breslow thickness of melanoma (continuous), ulceration status (binary) and mitotic rate (classified as ordinal categorical variable). I have selected independent variables where data is complete (no missing observations) - age, melanoma subtype, sex, subsite location as well as outcome indicator and survival hazard function.
Below is my code thus far for imputation, I am fairly happy with the mi estimate coefficients very closely mirroring the coefficients estimated from non-imputed dataset.
My question to the forum is what would be the appropriate process/syntax to incoporate the imputed values into the incomplete/missing datapoints to allow continuation of my survival analysis models with a 'complete' dataset? (apologies if I have not worded this correctly and if this is a basic question- I have trawled through the Statalist forums and other useful sites such as UCLA and various MI lectures as well as the Stata manual but could not find this process described; I have also found the MI menu interface tricky to follow)
Many thanks in advance,
I am currently performing a survival analysis project for melanoma (a form of skin cancer). I am reasonably new to Stata having only started using in past 4 months.
I have been using a Cox proportional hazard model thus far in my analyses.
Within the dataset of approximately 3,600 observations there are up to 20% missing variables.
I have explored exclusion and other missing variable methods however too many of my failures would be lost for my analysis (currently total 400 failures which are melanoma specific deaths)
I have ended up choosing the utility of multiple imputation using chained equations (MICE) given that some of the key prognostic variables are not normally distributed and heavily skewed.
To begin with I have selected key prognostic values recorded within the dataset for melanoma being Breslow thickness of melanoma (continuous), ulceration status (binary) and mitotic rate (classified as ordinal categorical variable). I have selected independent variables where data is complete (no missing observations) - age, melanoma subtype, sex, subsite location as well as outcome indicator and survival hazard function.
Below is my code thus far for imputation, I am fairly happy with the mi estimate coefficients very closely mirroring the coefficients estimated from non-imputed dataset.
My question to the forum is what would be the appropriate process/syntax to incoporate the imputed values into the incomplete/missing datapoints to allow continuation of my survival analysis models with a 'complete' dataset? (apologies if I have not worded this correctly and if this is a basic question- I have trawled through the Statalist forums and other useful sites such as UCLA and various MI lectures as well as the Stata manual but could not find this process described; I have also found the MI menu interface tricky to follow)
Code:
mi stset timem, failure(censor2==1) scale(1) mi set mlong mi register imputed breslow ulcer mitosescat4 mi impute chained (regress) breslow (logit) ulcer (ologit) mitosescat4 = agecat2 subtype sex subsitecat4 matthews_haz censor2, add(10) mi estimate: regress breslow i.ulcer i.mitosescat4
Comment