Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to pool estimated results in Multiple Imputation?

    Hi Statalisters,

    I am trying to impute missing crime data found in the UCR. However, I am not sure how to proceed after the estimation stage. Am I supposed to use the already imputed values? I understand that pooling collects the data post-estimation and gives one set of results. However, I am not being able to understand how do I get to that. I would greatly appreciate some help. P.S. This is my first time using the MI package.

    mi impute chained (pmm, knn(15)) Murder Manslaughter Rape Robbery Assault Burglary Larceny_Theft ///
    Vehicle_Theft Other_Assault Arson Forgery Fraud Embezzlement Stolen_Property Vandalism Weapons ///
    Prostitution Drugs_Offenses Drugs_Sale Drugs_Possesion DUI Disorderly ///
    = Tot_Officers Tot_Civ_Emp i.Major i.Minor Unemp_Rate Male_Perc Hispanic_Perc ///
    Black_Perc Age_1624_Perc PerCapWageL PopDensityL ///
    , add(20) rseed(12345) nolegend noisily bootstrap

    local dep_vars "Murder Manslaughter Rape Robbery Assault Burglary Larceny_Theft Vehicle_Theft Other_Assault Arson Forgery Fraud Embezzlement Stolen_Property Vandalism Weapons Prostitution Drugs_Offenses Drugs_Sale Drugs_Possesion DUI Disorderly "

    local indep_vars "Tot_Officers Tot_Civ_Emp Unemp_Rate Male_Perc Hispanic_Perc Black_Perc Age_1624_Perc PerCapWageL PopDensityL"

    foreach dep_var in `dep_vars' {
    mi estimate, vartable: xtreg `dep_var' `indep_vars' L.Major L.Minor i.year , fe vce(cluster fips)
    mi predict xb(`dep_var'_pred)
    }

  • #2
    I am not sure about the loop as apparently you want to estimate many models, but your general approach is correct as you use mi estimate. If you use xtreg, you also need to set
    Code:
    mi xtset
    before.
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      I have xtset the data, maybe I missed sharing that code here. What I am unable to understand is, how do I pool the estimated results? From what I understand is that after the analysis (mi estimate) stage, I would get a single set of pooled observations right? How do I get that?

      Comment


      • #4
        No, mi estimate does all the work for you as it automatically pools all imputed datasets together following Rubin's rules. Have a look at
        Code:
        help mi estimate
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          Please do pardon my ignorance, but are these estimates then stored in the original imputations itself? I am a little confused, where are the pooled estimates stored? How do i replace the missing observations with the estimated and pooled results?

          Comment


          • #6
            This information is given in

            Code:
            help mi set
            The imputed data are stored in the long format if you have used flong. The dataset (browse) contains the original data (_mi_m == 0) and the imputed data (_mi_m > 0).
            Best wishes

            (Stata 16.1 MP)

            Comment


            • #7
              No estimation commands in Stata store their results in the data set, as far as I am aware. Estimation commands store the coefficients in e(b) and the variance-covariance matrix in e(V). Peculiarily, -mi estimate- does not do that by default, but you can ask for it by adding the -post- option to the -mi estimate:- prefix. So if you do
              Code:
              mi estimate, post vartable: xtreg `dep_var' `indep_vars' L.Major L.Minor i.year , fe vce(cluster fips)
              then after the regressions are all run, the pooled estimates will be found in e(b) and their variance-covariance matrix in e(V).

              Perhaps more conveniently, the results that are displayed in the results table (coefficients, standard errors, test statistics) are also stored in r(table). Do remember that any Stata results stored in r() are at risk of being overwritten by subsequent commands, so if you want to use these, it behooves you to store r(table) as a matrix immediately after the -mi estimate- command.

              Comment


              • #8
                Clyde,

                Thank you very much for responding. I am guessing the mi estimate, will yield estimated coefficients for each row of the data right?

                Comment


                • #9
                  I am guessing the mi estimate, will yield estimated coefficients for each row of the data right?
                  No! The output of an -mi estimate- command will resemble the output of whatever regression command is being -mi estimate-d. There will be one estimated coefficient for each right hand side variable of the regression model (where, for discrete variables, the number of "variables" is going to be the number of levels of the discrete variable minus 1).

                  Are you thinking about something analogous to the -predict- command, which produces an estimated value of the outcome variable for each observation ("row") in the data set? For that, you use the -mi predict- command. You have mentioned -mi predict- in your code in #1, but the syntax you use there is incorrect. Do read the help file to see how it is used. Bear in mind that -mi predict- requires that you -estimates save- the -mi estimate- results first. That is not done by the methods I mentioned in #7. That, instead, relies on the -estimates save- command. Do read the help file for that as well if you are not already familiar with it.

                  Comment


                  • #10
                    I think I have some clarity on the process now. I guess mi estimate command runs the regression that I would have ran had the data been complete in the first place. On a different note, can you tell if averaging all the imputed datasets and using them separately (without the mi wrapper) in regressions should yield similar results compared to the mi estimate command?

                    Comment


                    • #11
                      I think that averaging the coefficients from separate regressions on the imputed data sets does produce the same coefficients you get from -mi estimate-. But the standard errors, and the other statistics that depend on it, are not that simple.

                      Comment


                      • #12
                        Just to confirm that we are on the same page, here's what I am thinking. I would like to use the `var'_ImpMean to replace the missing observations and then collapse the data by county year, which I could later use to run regressions without the "mi" wrapper. Is that what you also mean by averaging coefficients?

                        mi impute chained (pmm, knn(15)) Murder Manslaughter Rape Robbery Assault Burglary Larceny_Theft ///
                        Vehicle_Theft Other_Assault Arson Forgery Fraud Embezzlement Stolen_Property Vandalism Weapons ///
                        Prostitution Drugs_Offenses Drugs_Sale Drugs_Possesion DUI Disorderly Violent Property Total_Crime ///
                        = Tot_Officers Tot_Civ_Emp i.Major i.Minor Unemp_Rate Male_Perc Hispanic_Perc ///
                        Black_Perc Age_1624_Perc PerCapWageL PopDensityL ///
                        , add(20) rseed(12345) nolegend noisily bootstrap
                        ************************************************** ****************************
                        ///noimputed bootstrap

                        local vars "Murder Manslaughter Rape Robbery Assault Burglary Larceny_Theft Vehicle_Theft Other_Assault Arson Forgery Fraud Embezzlement Stolen_Property Vandalism Weapons Prostitution Drugs_Offenses Drugs_Sale Drugs_Possesion DUI Disorderly Violent Property Total_Crime"
                        foreach var of local vars {
                        egen `var'_ImpMean = rowmean(_1_`var'-_20_`var')
                        replace `var'_ImpMean = round(`var'_ImpMean)
                        }
                        Last edited by Anupam Ghosh; 08 Dec 2024, 13:42.

                        Comment


                        • #13
                          Just to confirm that we are on the same page, here's what I am thinking. I would like to use the `var'_ImpMean to replace the missing observations and then collapse the data by county year, which I could later use to run regressions without the "mi" wrapper. Is that what you also mean by averaging coefficients?
                          If all you care about are the coefficients themselves, you could do that. But you would not have usable standard errors, test-statistics, p-values, or confidence intervals using this method. Why not just run -mi estimate- again? That gives you complete and useful statistics all around.

                          Comment

                          Working...
                          X