How to conduct sensitivity analysis in multiple linear regression model?

Raoping Tu

Join Date: May 2018

Posts: 24
#1

How to conduct sensitivity analysis in multiple linear regression model?

01 Jun 2018, 01:05

Dear all,

Anyone who knows when missingness occurs on both dependent and independent variables, how to do sensivity analysis?

Look forward to your reply!

Best regards,
Raoping
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3421
#2

01 Jun 2018, 01:52

You first need to define what kind of sensitivity you are interested in investigating. That will help you find a family of models you could estimate. You estimate them, and you see if they result in different findings.

This is a very general answer. If you give us more details, then we can try give you a more specific answer.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

�
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17658
#3

01 Jun 2018, 04:37

Raoping:
as an aside to Maarten's wise reply, in dealing with missing data sensitivity analysis is usually recommended when values are missing not at random (see http://www.stefvanbuuren.nl/publicat...ed%201999.pdf; pararagraph 3.4).

Kind regards,
Carlo
(StataNow 18.5)
Comment

�
Raoping Tu

Join Date: May 2018

Posts: 24
#4

01 Jun 2018, 08:45

Originally posted by Carlo Lazzaro View Post

Raoping:
as an aside to Maarten's wise reply, in dealing with missing data sensitivity analysis is usually recommended when values are missing not at random (see http://www.stefvanbuuren.nl/publicat...ed%201999.pdf; pararagraph 3.4).

Hi,
Carlo, the link which you shared is not working. Would you pls let me know the title and authors of that document?

Thank you!

Raoping
Comment

�
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17658
#5

01 Jun 2018, 08:57

Raoping:
sorry for the mishap.
It works for me when clicked on:
http://www.stefvanbuuren.nl/publicat...Med%201999.pdf

Kind regards,
Carlo
(StataNow 18.5)
Comment

�
Raoping Tu

Join Date: May 2018

Posts: 24
#6

04 Jun 2018, 00:59

Originally posted by Carlo Lazzaro View Post

Raoping:
sorry for the mishap.
It works for me when clicked on:
http://www.stefvanbuuren.nl/publicat...Med%201999.pdf

Thank you! Now it is working.

Raoping
Comment

�
Raoping Tu

Join Date: May 2018

Posts: 24
#7

04 Jun 2018, 02:53

Originally posted by Maarten Buis View Post

You first need to define what kind of sensitivity you are interested in investigating. That will help you find a family of models you could estimate. You estimate them, and you see if they result in different findings.

This is a very general answer. If you give us more details, then we can try give you a more specific answer.

Dear Maarten,

I would like to examine the association between the psychological distress and C-reactive protein among people with different level of education. Unfortunately, 30% of C-reactive protein(dependent variable) were missing, almost 20% missing in BMI (covariates). I want to see if 30% missing in CRP cause bias in my data analysis or I just want to see if my result is stable.

Thank you!

Best regards,
Raoping
Comment

�
Maarten Buis

Join Date: Mar 2014

Posts: 3421
#8

04 Jun 2018, 04:06

So you want to look at the impact of missing values on your estimate. When you talk about robustness, you are actually talking about how different models (for dealing with missing values) lead to similar or different results. So you need to estimate different model for dealing with missing values. The simplest is just ignore all observations with at least one missing value. This is what Stata does if estimate a "normal" model. You can use mi for your second model, see help mi, but for just getting quickly a first impression, I tend to prefer weighting. You can quickly compute the weights to adjust for missing values your self and compare a weighted and unweighted model. You cannot control for missing values in the dependent/explained/y-variable this way, but as a first impression this has served me well.

Code:

// open example dataset sysuse nlsw88, clear // compute weights gen obs = !missing(union, tenure) // binary variable: 0 missing on the xs, 1 observed on the xs xtile cat = wage, nq(10) // split the dependent variable up in 10 equally well filled groups logit obs i.cat // how does the chance of being observed depend on the dependent variable predict double w if wage < ., pr // predict chance of being observed replace w = 1/w // weight = 1/chance // compare weighted and unweighted model reg wage union tenure [pw=w] reg wage union tenure

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
1 like
Comment

�
Raoping Tu

Join Date: May 2018

Posts: 24
#9

05 Jun 2018, 02:19

Originally posted by Maarten Buis View Post

So you want to look at the impact of missing values on your estimate. When you talk about robustness, you are actually talking about how different models (for dealing with missing values) lead to similar or different results. So you need to estimate different model for dealing with missing values. The simplest is just ignore all observations with at least one missing value. This is what Stata does if estimate a "normal" model. You can use mi for your second model, see help mi, but for just getting quickly a first impression, I tend to prefer weighting. You can quickly compute the weights to adjust for missing values your self and compare a weighted and unweighted model. You cannot control for missing values in the dependent/explained/y-variable this way, but as a first impression this has served me well.

Code:

// open example dataset sysuse nlsw88, clear // compute weights gen obs = !missing(union, tenure) // binary variable: 0 missing on the xs, 1 observed on the xs xtile cat = wage, nq(10) // split the dependent variable up in 10 equally well filled groups logit obs i.cat // how does the chance of being observed depend on the dependent variable predict double w if wage < ., pr // predict chance of being observed replace w = 1/w // weight = 1/chance // compare weighted and unweighted model reg wage union tenure [pw=w] reg wage union tenure

Dear Maarten,

Thank you for the explanation! So the method you recommend also is applicable for missing dependent values? My main purpose is to look at the impact of missing dependent values on my estimation. As many literature indicate that it is not a good idea to impute missing dependent variable by Multiple Imputation, so now I don't know how to do.

Look forward to your reply!

Raoping
Comment

�
Maarten Buis

Join Date: Mar 2014

Posts: 3421
#10

05 Jun 2018, 02:49

The way I used weighting cannot be used to correct for missing values on the dependent variable.

As to multiple imputation, your reading of the literature is wrong: You should definitely impute missing values on the dependent variable. Not doing so will seriously bias your imputation results.

It is even more complex than that: Not including your dependent variable in the imputation model is seriously wrong. So it has to be in the imputation models, and the imputations must be used. However, if we compare a correct imputation model, the imputations on the dependent variable should not do much compared to a model that just leaves the missing values out. That has to do with the MAR assumption that underlies Multiple Imputation and the fact that missing values bias results by being dependent on the dependent variable. So if you find a difference in your MI model and a regular model, then that is due to imputing the independent variables.

If you worry about the impact of missing values on the dependent variable, then you have to relax the MAR assumption. Now you are in trouble. Models for that exist, e.g. heckman. However, in my taste, they are too dependent on all kinds of untestable assumptions.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

�

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

#11

05 Jun 2018, 04:35

Maarten Buis Following your example in #8, it seems the two regression models are quite similar, and we could infer that a sensitivity analysis would perform just fine (in terms of corroborating the proposed model).

Then, I typed afterwards:

Code:

. ttest wage, by(obs)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
---------+--------------------------------------------------------------------
       0 |     378    8.661772    .5400464     10.4997    7.599892    9.723653
       1 |   1,868    7.585877    .0964482    4.168528    7.396719    7.775034
---------+--------------------------------------------------------------------
combined |   2,246    7.766949    .1214451    5.755523    7.528793    8.005105
---------+--------------------------------------------------------------------
    diff |            1.075896     .323882                .4407558    1.711035
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t =   3.3219
Ho: diff = 0                                     degrees of freedom =     2244

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.9995         Pr(|T| > |t|) = 0.0009          Pr(T > t) = 0.0005

* Also, on the same verge:

. reg wage obs

      Source |       SS           df       MS      Number of obs   =     2,246
-------------+----------------------------------   F(1, 2244)      =     11.03
       Model |  363.914299         1  363.914299   Prob > F        =    0.0009
    Residual |  74004.0531     2,244  32.9786333   R-squared       =    0.0049
-------------+----------------------------------   Adj R-squared   =    0.0044
       Total |  74367.9674     2,245  33.1260434   Root MSE        =    5.7427

------------------------------------------------------------------------------
        wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         obs |  -1.075896    .323882    -3.32   0.001    -1.711035   -.4407558
       _cons |   8.661772   .2953728    29.32   0.000      8.08254    9.241005
------------------------------------------------------------------------------

Considering that Yvar presented (significantly) different results according to being taken from the missing/nonmissing group, couldn't we state that the MAR assumption was violated?

Thanks in advance.

Edited to add: since the SDs between groups differ much, I also double-checked by using the Satterthwaite's method, whose p-value was 0.0505.

Last edited by Marcos Almeida; 05 Jun 2018, 04:43.

Best regards,

Marcos

Comment

Maarten Buis

Join Date: Mar 2014

Posts: 3421
#12

05 Jun 2018, 05:13

MAR assumes that the chance of getting a missing values on a variable X does not depend on those unobserved values of X. The chance of getting a missing value on X may depend on observed values of other variables. This is what allows MI to correct for (some of) the bias due to missing values. So your test is not a test of MAR. In fact, MAR is by definition untestable.

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment

�
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#13

05 Jun 2018, 06:23

Thank you for the clarifying reply.

Best regards,

Marcos
Comment

�
Raoping Tu

Join Date: May 2018

Posts: 24
#14

06 Jun 2018, 08:43

Originally posted by Maarten Buis View Post

The way I used weighting cannot be used to correct for missing values on the dependent variable.

As to multiple imputation, your reading of the literature is wrong: You should definitely impute missing values on the dependent variable. Not doing so will seriously bias your imputation results.

It is even more complex than that: Not including your dependent variable in the imputation model is seriously wrong. So it has to be in the imputation models, and the imputations must be used. However, if we compare a correct imputation model, the imputations on the dependent variable should not do much compared to a model that just leaves the missing values out. That has to do with the MAR assumption that underlies Multiple Imputation and the fact that missing values bias results by being dependent on the dependent variable. So if you find a difference in your MI model and a regular model, then that is due to imputing the independent variables.

If you worry about the impact of missing values on the dependent variable, then you have to relax the MAR assumption. Now you are in trouble. Models for that exist, e.g. heckman. However, in my taste, they are too dependent on all kinds of untestable assumptions.

Thank you for the detailed explanation！

Raoping
Comment

�

Announcement