Hi. I'm trying to understand the best way to work with imputed data in Stata after imputation, specifically exporting the imputed data and performing regressions on the imputed data. I am using Stata/SE 15.1 for Windows.
I have 11 categorical variables and 355 respondents, with roughly 20% of each variable's data missing. I have used the Statistics>Multiple imputation>Multiple imputation control panel to impute the missing values. I have several questions about working with the data following imputation.
1) I have imputed the data 20 times and wanted to export the imputed sets to Excel so that I can average the 11 variable responses per respondent across the 20 data sets and develop a score from the overall average per respondent. After imputation, I've used the File>Export>Data to Excel spreadsheet command, but I'm having difficulty understanding the exported Excel file. The first portion is obvioulsy my original data with the missing values, but the groups of imputed values below don't seem to correspond to the missing values in the original data set (at the top of the spreadsheet). Am I mistaken or not understanding the imputation output? Is there a cleaner way to export the data where I get 20 complete sets of data with the imputed values included? I've played with multiple export commands and can't seem to find a solution. I've attached the associated file. Imputed B24 data.xls Any insight would be greatly appreciated.
2) Following imputation, I want to perform various analyses on the imputed data. For instance, I want to regress the 11 imputed variables in the attached spreadsheet on 'age'. For a logistic regression I know to use the 'logit' command, but I'm uncertain how to reference my newly imputed data in the command line. I chose 'marginal long' as the data type when conducting the imputation, and thought I could refer to the new data as 'mlong', but I continue to get an 'r111' error (variable not found).
3) This last part isn't really a problem, just something I need clarification on as I'm new to multiple imputation. When I regress the newly imputed data on other variables (such as age), Stata will use all 20 imputed data sets to run the regressions, correct? I want to make sure I understand this so I can explain it adequately.
If clarification is needed or I need to move this post to another forum, please let me know. Thank you very much in advance for any advice that is given.
I have 11 categorical variables and 355 respondents, with roughly 20% of each variable's data missing. I have used the Statistics>Multiple imputation>Multiple imputation control panel to impute the missing values. I have several questions about working with the data following imputation.
1) I have imputed the data 20 times and wanted to export the imputed sets to Excel so that I can average the 11 variable responses per respondent across the 20 data sets and develop a score from the overall average per respondent. After imputation, I've used the File>Export>Data to Excel spreadsheet command, but I'm having difficulty understanding the exported Excel file. The first portion is obvioulsy my original data with the missing values, but the groups of imputed values below don't seem to correspond to the missing values in the original data set (at the top of the spreadsheet). Am I mistaken or not understanding the imputation output? Is there a cleaner way to export the data where I get 20 complete sets of data with the imputed values included? I've played with multiple export commands and can't seem to find a solution. I've attached the associated file. Imputed B24 data.xls Any insight would be greatly appreciated.
2) Following imputation, I want to perform various analyses on the imputed data. For instance, I want to regress the 11 imputed variables in the attached spreadsheet on 'age'. For a logistic regression I know to use the 'logit' command, but I'm uncertain how to reference my newly imputed data in the command line. I chose 'marginal long' as the data type when conducting the imputation, and thought I could refer to the new data as 'mlong', but I continue to get an 'r111' error (variable not found).
3) This last part isn't really a problem, just something I need clarification on as I'm new to multiple imputation. When I regress the newly imputed data on other variables (such as age), Stata will use all 20 imputed data sets to run the regressions, correct? I want to make sure I understand this so I can explain it adequately.
If clarification is needed or I need to move this post to another forum, please let me know. Thank you very much in advance for any advice that is given.
Comment