Using mvencode for continuous variable

Rajini Nagrani

Join Date: Jun 2018

Posts: 13
#1

Using mvencode for continuous variable

03 Mar 2022, 07:09

Dear Stata users,

I would like to change missing values to numeric values in continuous variable which I would like to use in the missing indicator method for regression analysis. I am aware this is possible for categorical variable (using mvencode), but I am not sure if its possible with continuous variables. Is there a workaround which would allow me to include missing values using this method.

Thank you in advance.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35212
#2

03 Mar 2022, 07:33

Please give an example of what you want to do.
Comment

Rajini Nagrani

Join Date: Jun 2018
Posts: 13

03 Mar 2022, 08:13

Dear Nick,

Thank you for your prompt reply. I have included my output below. Though I am able to incorporate missing values of categorical variables (by coding them 99). I would like to know how should I include missing values of continuous variables (OsmolalitymOsmol NKRE packyr_parent).

I was thinking if I should round them off to nearest whole number and use it without prefixing it with c. such that I will be able to code their missing values to 99 too and use it in my regression analysis. However I am not sure if this would be a correct strategy

HTML Code:

. mvencode isced_cat2011_T alc_lifecat club_mbr_T if survey==3, mv(99)
isced_cat2~T: 3 missing values recoded
 alc_lifecat: 27 missing values recoded
  club_mbr_T: 15 missing values recoded

. logistic alert_meta_c ln_particle c.Age i.sex c.OsmolalitymOsmol c.NKRE i.Country i.isced_cat2011_T c.AVM_1_week_T i
> .club_mbr_T i.alc_lifecat i.dm_fhT c.packyr_parent if survey==3
note: 99.isced_cat2011_T != 0 predicts failure perfectly
      99.isced_cat2011_T dropped and 2 obs not used

note: 99.club_mbr_T != 0 predicts failure perfectly
      99.club_mbr_T dropped and 3 obs not used


Logistic regression                             Number of obs     =        329
                                                LR chi2(19)       =      23.69
                                                Prob > chi2       =     0.2082
Log likelihood = -76.608337                     Pseudo R2         =     0.1339

----------------------------------------------------------------------------------
    alert_meta_c | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
     ln_particle |    1.54518   .2878729     2.34   0.020     1.072499    2.226185
             Age |   1.149376   .1962292     0.82   0.415      .822506    1.606145
           2.sex |   1.036745   .5036792     0.07   0.941     .4000693    2.686637
OsmolalitymOsmol |   .9996726   .0012579    -0.26   0.795     .9972102    1.002141
       NKRE_NM_T |   .9999995   .0000548    -0.01   0.993     .9998921    1.000107
                 |
         Country |
        Estonia  |   .3973304   .4379087    -0.84   0.402     .0458162    3.445753
        Belgium  |    .551809   .6286189    -0.52   0.602     .0591702    5.146058
         Sweden  |   .3587412   .3224972    -1.14   0.254     .0615995    2.089224
        Germany  |   .2142229   .2389113    -1.38   0.167      .024075    1.906187
        Hungary  |   .4205498   .3661207    -0.99   0.320     .0763451    2.316613
          Spain  |   .7271148   .8634607    -0.27   0.788      .070923    7.454503
                 |
 isced_cat2011_T |
              2  |   .7813814   .5376551    -0.36   0.720     .2028458    3.009956
              3  |    .280049   .2450871    -1.45   0.146     .0503846    1.556574
             99  |          1  (empty)
                 |
    AVM_1_week_T |   1.030166   .0239067     1.28   0.200     .9843599    1.078105
                 |
      club_mbr_T |
              2  |   1.103071   .5560308     0.19   0.846     .4107089    2.962601
             99  |          1  (empty)
                 |
     alc_lifecat |
              1  |   .7263582   .4970298    -0.47   0.640     .1899738    2.777206
             99  |   1.882165   2.208862     0.54   0.590     .1886726    18.77615
                 |
        1.dm_fhT |   4.802401   4.476905     1.68   0.092     .7725907    29.85159
   packyr_parent |   1.013131    .017896     0.74   0.460     .9786554     1.04882
           _cons |   .0000902   .0003136    -2.68   0.007     9.93e-08    .0820068
----------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

Comment

Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2371
#4

03 Mar 2022, 08:45

Though I am able to incorporate missing values of categorical variables (by coding them 99).

As a complete aside, this practice of using a "missing" category is known to bias results. See the following paper for commentary and examples:

Vach, W., & Blettner, M. (1991). Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. American Journal of Epidemiology, 134(8), 895–907. https://doi.org/10.1093/oxfordjournals.aje.a116164
Comment
Rajini Nagrani

Join Date: Jun 2018

Posts: 13
#5

08 Mar 2022, 03:11

Thank you Leonardo for sharing this paper. However I was suggested this method to adjust for drop in numbers in the adjusted model compared to the crude model due to missing covariables. We also thought of multiple imputation, however since we only wanted to look at the missing covariates (and not exposure or outcome variables) this method is under consideration.
Comment

Announcement