Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using mvencode for continuous variable

    Dear Stata users,

    I would like to change missing values to numeric values in continuous variable which I would like to use in the missing indicator method for regression analysis. I am aware this is possible for categorical variable (using mvencode), but I am not sure if its possible with continuous variables. Is there a workaround which would allow me to include missing values using this method.

    Thank you in advance.

  • #2
    Please give an example of what you want to do.

    Comment


    • #3
      Dear Nick,

      Thank you for your prompt reply. I have included my output below. Though I am able to incorporate missing values of categorical variables (by coding them 99). I would like to know how should I include missing values of continuous variables (OsmolalitymOsmol NKRE packyr_parent).

      I was thinking if I should round them off to nearest whole number and use it without prefixing it with c. such that I will be able to code their missing values to 99 too and use it in my regression analysis. However I am not sure if this would be a correct strategy

      HTML Code:
      . mvencode isced_cat2011_T alc_lifecat club_mbr_T if survey==3, mv(99)
      isced_cat2~T: 3 missing values recoded
       alc_lifecat: 27 missing values recoded
        club_mbr_T: 15 missing values recoded
      
      . logistic alert_meta_c ln_particle c.Age i.sex c.OsmolalitymOsmol c.NKRE i.Country i.isced_cat2011_T c.AVM_1_week_T i
      > .club_mbr_T i.alc_lifecat i.dm_fhT c.packyr_parent if survey==3
      note: 99.isced_cat2011_T != 0 predicts failure perfectly
            99.isced_cat2011_T dropped and 2 obs not used
      
      note: 99.club_mbr_T != 0 predicts failure perfectly
            99.club_mbr_T dropped and 3 obs not used
      
      
      Logistic regression                             Number of obs     =        329
                                                      LR chi2(19)       =      23.69
                                                      Prob > chi2       =     0.2082
      Log likelihood = -76.608337                     Pseudo R2         =     0.1339
      
      ----------------------------------------------------------------------------------
          alert_meta_c | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -----------------+----------------------------------------------------------------
           ln_particle |    1.54518   .2878729     2.34   0.020     1.072499    2.226185
                   Age |   1.149376   .1962292     0.82   0.415      .822506    1.606145
                 2.sex |   1.036745   .5036792     0.07   0.941     .4000693    2.686637
      OsmolalitymOsmol |   .9996726   .0012579    -0.26   0.795     .9972102    1.002141
             NKRE_NM_T |   .9999995   .0000548    -0.01   0.993     .9998921    1.000107
                       |
               Country |
              Estonia  |   .3973304   .4379087    -0.84   0.402     .0458162    3.445753
              Belgium  |    .551809   .6286189    -0.52   0.602     .0591702    5.146058
               Sweden  |   .3587412   .3224972    -1.14   0.254     .0615995    2.089224
              Germany  |   .2142229   .2389113    -1.38   0.167      .024075    1.906187
              Hungary  |   .4205498   .3661207    -0.99   0.320     .0763451    2.316613
                Spain  |   .7271148   .8634607    -0.27   0.788      .070923    7.454503
                       |
       isced_cat2011_T |
                    2  |   .7813814   .5376551    -0.36   0.720     .2028458    3.009956
                    3  |    .280049   .2450871    -1.45   0.146     .0503846    1.556574
                   99  |          1  (empty)
                       |
          AVM_1_week_T |   1.030166   .0239067     1.28   0.200     .9843599    1.078105
                       |
            club_mbr_T |
                    2  |   1.103071   .5560308     0.19   0.846     .4107089    2.962601
                   99  |          1  (empty)
                       |
           alc_lifecat |
                    1  |   .7263582   .4970298    -0.47   0.640     .1899738    2.777206
                   99  |   1.882165   2.208862     0.54   0.590     .1886726    18.77615
                       |
              1.dm_fhT |   4.802401   4.476905     1.68   0.092     .7725907    29.85159
         packyr_parent |   1.013131    .017896     0.74   0.460     .9786554     1.04882
                 _cons |   .0000902   .0003136    -2.68   0.007     9.93e-08    .0820068
      ----------------------------------------------------------------------------------
      Note: _cons estimates baseline odds.

      Comment


      • #4
        Though I am able to incorporate missing values of categorical variables (by coding them 99).
        As a complete aside, this practice of using a "missing" category is known to bias results. See the following paper for commentary and examples:

        Vach, W., & Blettner, M. (1991). Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. American Journal of Epidemiology, 134(8), 895–907. https://doi.org/10.1093/oxfordjournals.aje.a116164

        Comment


        • #5
          Thank you Leonardo for sharing this paper. However I was suggested this method to adjust for drop in numbers in the adjusted model compared to the crude model due to missing covariables. We also thought of multiple imputation, however since we only wanted to look at the missing covariates (and not exposure or outcome variables) this method is under consideration.

          Comment

          Working...
          X