How to pool estimated results in Multiple Imputation?

Anupam Ghosh

Join Date: Jan 2023

Posts: 113
#1

How to pool estimated results in Multiple Imputation?

05 Dec 2024, 23:18

Hi Statalisters,

I am trying to impute missing crime data found in the UCR. However, I am not sure how to proceed after the estimation stage. Am I supposed to use the already imputed values? I understand that pooling collects the data post-estimation and gives one set of results. However, I am not being able to understand how do I get to that. I would greatly appreciate some help. P.S. This is my first time using the MI package.

mi impute chained (pmm, knn(15)) Murder Manslaughter Rape Robbery Assault Burglary Larceny_Theft ///
Vehicle_Theft Other_Assault Arson Forgery Fraud Embezzlement Stolen_Property Vandalism Weapons ///
Prostitution Drugs_Offenses Drugs_Sale Drugs_Possesion DUI Disorderly ///
= Tot_Officers Tot_Civ_Emp i.Major i.Minor Unemp_Rate Male_Perc Hispanic_Perc ///
Black_Perc Age_1624_Perc PerCapWageL PopDensityL ///
, add(20) rseed(12345) nolegend noisily bootstrap

local dep_vars "Murder Manslaughter Rape Robbery Assault Burglary Larceny_Theft Vehicle_Theft Other_Assault Arson Forgery Fraud Embezzlement Stolen_Property Vandalism Weapons Prostitution Drugs_Offenses Drugs_Sale Drugs_Possesion DUI Disorderly "

local indep_vars "Tot_Officers Tot_Civ_Emp Unemp_Rate Male_Perc Hispanic_Perc Black_Perc Age_1624_Perc PerCapWageL PopDensityL"

foreach dep_var in `dep_vars' {
mi estimate, vartable: xtreg `dep_var' `indep_vars' L.Major L.Minor i.year , fe vce(cluster fips)
mi predict xb(`dep_var'_pred)
}
Tags: chained equation, data, missing data, multiple imputation, panel data
Felix Bittmann

Join Date: Aug 2018

Posts: 616
#2

06 Dec 2024, 01:05

I am not sure about the loop as apparently you want to estimate many models, but your general approach is correct as you use mi estimate. If you use xtreg, you also need to set

Code:

mi xtset

before.

Best wishes

(Stata 16.1 MP)
Comment
Anupam Ghosh

Join Date: Jan 2023

Posts: 113
#3

06 Dec 2024, 01:14

I have xtset the data, maybe I missed sharing that code here. What I am unable to understand is, how do I pool the estimated results? From what I understand is that after the analysis (mi estimate) stage, I would get a single set of pooled observations right? How do I get that?
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 616
#4

06 Dec 2024, 01:20

No, mi estimate does all the work for you as it automatically pools all imputed datasets together following Rubin's rules. Have a look at

Code:

help mi estimate

Best wishes

(Stata 16.1 MP)
Comment
Anupam Ghosh

Join Date: Jan 2023

Posts: 113
#5

06 Dec 2024, 02:14

Please do pardon my ignorance, but are these estimates then stored in the original imputations itself? I am a little confused, where are the pooled estimates stored? How do i replace the missing observations with the estimated and pooled results?
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 616
#6

06 Dec 2024, 02:35

This information is given in

Code:

help mi set

The imputed data are stored in the long format if you have used flong. The dataset (browse) contains the original data (_mi_m == 0) and the imputed data (_mi_m > 0).

Best wishes

(Stata 16.1 MP)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29795
#7

06 Dec 2024, 11:18

No estimation commands in Stata store their results in the data set, as far as I am aware. Estimation commands store the coefficients in e(b) and the variance-covariance matrix in e(V). Peculiarily, -mi estimate- does not do that by default, but you can ask for it by adding the -post- option to the -mi estimate:- prefix. So if you do

Code:

mi estimate, post vartable: xtreg `dep_var' `indep_vars' L.Major L.Minor i.year , fe vce(cluster fips)

then after the regressions are all run, the pooled estimates will be found in e(b) and their variance-covariance matrix in e(V).

Perhaps more conveniently, the results that are displayed in the results table (coefficients, standard errors, test statistics) are also stored in r(table). Do remember that any Stata results stored in r() are at risk of being overwritten by subsequent commands, so if you want to use these, it behooves you to store r(table) as a matrix immediately after the -mi estimate- command.
Comment
Anupam Ghosh

Join Date: Jan 2023

Posts: 113
#8

07 Dec 2024, 02:17

Clyde,

Thank you very much for responding. I am guessing the mi estimate, will yield estimated coefficients for each row of the data right?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29795
#9

07 Dec 2024, 11:06

I am guessing the mi estimate, will yield estimated coefficients for each row of the data right?

No! The output of an -mi estimate- command will resemble the output of whatever regression command is being -mi estimate-d. There will be one estimated coefficient for each right hand side variable of the regression model (where, for discrete variables, the number of "variables" is going to be the number of levels of the discrete variable minus 1).

Are you thinking about something analogous to the -predict- command, which produces an estimated value of the outcome variable for each observation ("row") in the data set? For that, you use the -mi predict- command. You have mentioned -mi predict- in your code in #1, but the syntax you use there is incorrect. Do read the help file to see how it is used. Bear in mind that -mi predict- requires that you -estimates save- the -mi estimate- results first. That is not done by the methods I mentioned in #7. That, instead, relies on the -estimates save- command. Do read the help file for that as well if you are not already familiar with it.
Comment
Anupam Ghosh

Join Date: Jan 2023

Posts: 113
#10

08 Dec 2024, 01:29

I think I have some clarity on the process now. I guess mi estimate command runs the regression that I would have ran had the data been complete in the first place. On a different note, can you tell if averaging all the imputed datasets and using them separately (without the mi wrapper) in regressions should yield similar results compared to the mi estimate command?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29795
#11

08 Dec 2024, 09:23

I think that averaging the coefficients from separate regressions on the imputed data sets does produce the same coefficients you get from -mi estimate-. But the standard errors, and the other statistics that depend on it, are not that simple.
Comment
Anupam Ghosh

Join Date: Jan 2023

Posts: 113
#12

08 Dec 2024, 13:32

Just to confirm that we are on the same page, here's what I am thinking. I would like to use the `var'_ImpMean to replace the missing observations and then collapse the data by county year, which I could later use to run regressions without the "mi" wrapper. Is that what you also mean by averaging coefficients?

mi impute chained (pmm, knn(15)) Murder Manslaughter Rape Robbery Assault Burglary Larceny_Theft ///
Vehicle_Theft Other_Assault Arson Forgery Fraud Embezzlement Stolen_Property Vandalism Weapons ///
Prostitution Drugs_Offenses Drugs_Sale Drugs_Possesion DUI Disorderly Violent Property Total_Crime ///
= Tot_Officers Tot_Civ_Emp i.Major i.Minor Unemp_Rate Male_Perc Hispanic_Perc ///
Black_Perc Age_1624_Perc PerCapWageL PopDensityL ///
, add(20) rseed(12345) nolegend noisily bootstrap
************************************************** ****************************
///noimputed bootstrap

local vars "Murder Manslaughter Rape Robbery Assault Burglary Larceny_Theft Vehicle_Theft Other_Assault Arson Forgery Fraud Embezzlement Stolen_Property Vandalism Weapons Prostitution Drugs_Offenses Drugs_Sale Drugs_Possesion DUI Disorderly Violent Property Total_Crime"
foreach var of local vars {
egen `var'_ImpMean = rowmean(_1_`var'-_20_`var')
replace `var'_ImpMean = round(`var'_ImpMean)
}

Last edited by Anupam Ghosh; 08 Dec 2024, 13:42.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29795
#13

08 Dec 2024, 16:02

Just to confirm that we are on the same page, here's what I am thinking. I would like to use the `var'_ImpMean to replace the missing observations and then collapse the data by county year, which I could later use to run regressions without the "mi" wrapper. Is that what you also mean by averaging coefficients?

If all you care about are the coefficients themselves, you could do that. But you would not have usable standard errors, test-statistics, p-values, or confidence intervals using this method. Why not just run -mi estimate- again? That gives you complete and useful statistics all around.
Comment

Announcement

How to pool estimated results in Multiple Imputation?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment