Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data still missing variables after multiple imputation

    Hi,

    I've been trying to impute missing data with multiple imputation. I have GDPG as my dependent variable and OILR, FDI, IMP, AGR, IND, SER as my independent variables. All independent variables have missing values. I used the following command:

    mi impute regress OILR FDI IMP AGR IND SER, add(20) rseed(1234)

    Then I checked the dataset and all values were still missing. I then used this command:

    mi impute regress OILR FDI IMP AGR IND SER GDPG, add(20) rseed(1234)

    And still no luck. I also tried to impute each variable one by one along with GDPG but that did not work either (i.e. mi impute regress OILR GDPG, add(20) rseed(1234) for example).

    Can someone please advise me on what to do in this case.

    Thanks.
    Last edited by alex badalyan; 12 Apr 2021, 13:43.

  • #2
    can you show the output that you get from "mi impute regress"

    Comment


    • #3
      FernandoRios

      mi impute regress OILR FDI IMP AGR IND SER, add(20) rseed(1234)
      note: variables FDI IMP AGR IND SER registered as imputed and used to model variable OILR; this
      may cause some observations to be omitted from the estimation and may lead to missing
      imputed values
      OILR: missing imputed values produced
      This may occur when imputation variables are used as independent variables or when
      independent variables contain missing values. You can specify option force if you wish to
      proceed anyway.
      r(498);



      mi impute regress OILR FDI IMP AGR IND SER GDPG, add(20) rseed(1234)
      note: variables FDI IMP AGR IND SER registered as imputed and used to model variable OILR; this
      may cause some observations to be omitted from the estimation and may lead to missing
      imputed values
      OILR: missing imputed values produced
      This may occur when imputation variables are used as independent variables or when
      independent variables contain missing values. You can specify option force if you wish to
      proceed anyway.
      r(498);




      mi impute regress OILR GDPG, add(20) rseed(1234)

      Univariate imputation Imputations = 90
      Linear regression added = 20
      Imputed: m=71 through m=90 updated = 0

      ------------------------------------------------------------------
      | Observations per m
      |----------------------------------------------
      Variable | Complete Incomplete Imputed | Total
      -------------------+-----------------------------------+----------
      OILR | 248 12 12. | 260
      ------------------------------------------------------------------
      (complete + incomplete = total; imputed is the minimum across m
      of the number of filled-in observations.)

      Comment


      • #4
        did you check that atleast one of the variables i non missing for all observations?
        if you have cases where ALL are missing, Mi impute cannot do much.

        Comment


        • #5
          The GDPG has all values present (this is the dependent variable), but all of the independent variables have missing values.

          Comment


          • #6
            You want something like

            Code:
            mi impute chained (regress) OILR FDI IMP AGR IND SER = GDPG , add(20) rseed(1234)
            assuming that (i) all variables on the right-hand side of the equals sign do not have any missing values and (ii) all the variables on the left-hand side of the equals sign are continuous (iii) and linear regression is a reasonable model to fill in the respective missing values.

            You want to make sure that your imputation model includes all variables -- including the dependent variable -- that you will use in your analyses later. Any variable that you omit from the imputation model will have its association to the other variables biased towards zero; the same is true for any non-linear associations and really anything that is not built into the imputation model.

            Comment


            • #7
              Hi Daniel,

              I actually tried this as well and tried it again and I get this in the output (all my variables fit with the assumptions you outlined):


              mi impute chained (regress) OILR FDI IMP AGR IND SER = GDPG, add
              > (20) rseed(1234)

              Conditional models:
              IMP: regress IMP FDI OILR AGR IND SER GDPG
              FDI: regress FDI IMP OILR AGR IND SER GDPG
              OILR: regress OILR IMP FDI AGR IND SER GDPG
              AGR: regress AGR IMP FDI OILR IND SER GDPG
              IND: regress IND IMP FDI OILR AGR SER GDPG
              SER: regress SER IMP FDI OILR AGR IND GDPG

              Performing chained iterations ...

              Multivariate imputation Imputations = 130
              Chained equations added = 20
              Imputed: m=111 through m=130 updated = 0

              Initialization: monotone Iterations = 200
              burn-in = 10

              OILR: linear regression
              FDI: linear regression
              IMP: linear regression
              AGR: linear regression
              IND: linear regression
              SER: linear regression

              ------------------------------------------------------------------
              | Observations per m
              |----------------------------------------------
              Variable | Complete Incomplete Imputed | Total
              -------------------+-----------------------------------+----------
              OILR | 248 12 12. | 260
              FDI | 249 11 11 | 260
              IMP | 252 8 8 | 260
              AGR | 248 12 12 | 260
              IND | 233 27 27 | 260
              SER | 233 27 27 | 260
              ------------------------------------------------------------------
              (complete + incomplete = total; imputed is the minimum across m
              of the number of filled-in observations.)

              .



              However all the missing values still remain in my dataset…

              Comment


              • #8
                Yes, the original dataset still has missing values. That is supposed to be the case.

                I get the impression that you are fairly new to multiple imputation. I cannot tell whether that applies only to the technical details of Stata or also to the theoretical foundations of the approach. A forum discussion will probably not compensate for the latter. I would recommend that you stop here, take a step back, and start by reading (at least) pages 1--15 of [MI] Multiple Imputation.

                Comment


                • #9
                  Hi Daniel,

                  I have read over the Multiple Imputation manual but still don't seem to understand what I'm doing wrong and why after imputing I don't have the new generated imputed values in my dataset.

                  Comment


                  • #10
                    What happens if you actually run the imputed analysis? Like
                    Code:
                    mi estimate: regress GDPG OILR FDI IMP AGR IND SER
                    Can you provide the output?
                    Best wishes

                    (Stata 16.1 MP)

                    Comment


                    • #11
                      Hi Felix,

                      Here is the output:


                      mi estimate: regress GDPG OILR FDI IMP AGR IND SER


                      Multiple-imputation estimates Imputations = 20
                      Linear regression Number of obs = 260
                      Average RVI = 0.2623
                      Largest FMI = 0.5038
                      Complete DF = 253
                      DF adjustment: Small sample DF: min = 49.21
                      avg = 114.79
                      max = 176.96
                      Model F test: Equal FMI F( 6, 220.5) = 6.11
                      Within VCE type: OLS Prob > F = 0.0000


                      GDPG Coef. Std. Err. t P>t [95% Conf. Interval]

                      OILR .1569129 .0720589 2.18 0.034 .012121 .3017049
                      FDI .8074652 .2665314 3.03 0.003 .2774461 1.337484
                      IMP .0143569 .0400254 0.36 0.720 -.06469 .0934039
                      AGR .3803026 .1657976 2.29 0.023 .0521299 .7084754
                      IND -.1397202 .1097591 -1.27 0.208 -.3594361 .0799956
                      SER -.0922856 .0729068 -1.27 0.207 -.2361643 .051593
                      _cons 8.334273 6.398373 1.30 0.195 -4.306914 20.97546

                      Kind regards

                      Comment


                      • #12
                        The outputs in #7 and #11 suggest that you have 20 complete datasets with 260 observations, each. I do not know where your confusion comes from.

                        Show (in code delimiters, as Felix did) the results of

                        Code:
                        mi query
                        Perhaps you have used flongsep style, in which case the dataset in memory only includes the original observations with missing values while the completed datasets are stored separately on disk.

                        Comment

                        Working...
                        X