Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to conduct sensitivity analysis in multiple linear regression model?

    Dear all,

    Anyone who knows when missingness occurs on both dependent and independent variables, how to do sensivity analysis?

    Look forward to your reply!


    Best regards,
    Raoping

  • #2
    You first need to define what kind of sensitivity you are interested in investigating. That will help you find a family of models you could estimate. You estimate them, and you see if they result in different findings.

    This is a very general answer. If you give us more details, then we can try give you a more specific answer.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Raoping:
      as an aside to Maarten's wise reply, in dealing with missing data sensitivity analysis is usually recommended when values are missing not at random (see http://www.stefvanbuuren.nl/publicat...ed%201999.pdf; pararagraph 3.4).
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Raoping:
        as an aside to Maarten's wise reply, in dealing with missing data sensitivity analysis is usually recommended when values are missing not at random (see http://www.stefvanbuuren.nl/publicat...ed%201999.pdf; pararagraph 3.4).
        Hi,
        Carlo, the link which you shared is not working. Would you pls let me know the title and authors of that document?


        Thank you!

        Raoping

        Comment


        • #5
          Raoping:
          sorry for the mishap.
          It works for me when clicked on:
          http://www.stefvanbuuren.nl/publicat...Med%201999.pdf
          Kind regards,
          Carlo
          (StataNow 18.5)

          Comment


          • #6
            Originally posted by Carlo Lazzaro View Post
            Raoping:
            sorry for the mishap.
            It works for me when clicked on:
            http://www.stefvanbuuren.nl/publicat...Med%201999.pdf
            Thank you! Now it is working.


            Raoping

            Comment


            • #7
              Originally posted by Maarten Buis View Post
              You first need to define what kind of sensitivity you are interested in investigating. That will help you find a family of models you could estimate. You estimate them, and you see if they result in different findings.

              This is a very general answer. If you give us more details, then we can try give you a more specific answer.
              Dear Maarten,

              I would like to examine the association between the psychological distress and C-reactive protein among people with different level of education. Unfortunately, 30% of C-reactive protein(dependent variable) were missing, almost 20% missing in BMI (covariates). I want to see if 30% missing in CRP cause bias in my data analysis or I just want to see if my result is stable.


              Thank you!

              Best regards,
              Raoping

              Comment


              • #8
                So you want to look at the impact of missing values on your estimate. When you talk about robustness, you are actually talking about how different models (for dealing with missing values) lead to similar or different results. So you need to estimate different model for dealing with missing values. The simplest is just ignore all observations with at least one missing value. This is what Stata does if estimate a "normal" model. You can use mi for your second model, see help mi, but for just getting quickly a first impression, I tend to prefer weighting. You can quickly compute the weights to adjust for missing values your self and compare a weighted and unweighted model. You cannot control for missing values in the dependent/explained/y-variable this way, but as a first impression this has served me well.

                Code:
                // open example dataset
                sysuse nlsw88, clear
                
                // compute weights
                gen obs = !missing(union, tenure) // binary variable: 0 missing on the xs, 1 observed on the xs
                xtile cat = wage, nq(10)  // split the dependent variable up in 10 equally well filled groups
                logit obs i.cat // how does the chance of being observed depend on the dependent variable
                predict double w if wage < ., pr // predict chance of being observed
                replace w = 1/w // weight = 1/chance
                
                // compare weighted and unweighted model
                reg wage union tenure [pw=w]
                reg wage union tenure
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Originally posted by Maarten Buis View Post
                  So you want to look at the impact of missing values on your estimate. When you talk about robustness, you are actually talking about how different models (for dealing with missing values) lead to similar or different results. So you need to estimate different model for dealing with missing values. The simplest is just ignore all observations with at least one missing value. This is what Stata does if estimate a "normal" model. You can use mi for your second model, see help mi, but for just getting quickly a first impression, I tend to prefer weighting. You can quickly compute the weights to adjust for missing values your self and compare a weighted and unweighted model. You cannot control for missing values in the dependent/explained/y-variable this way, but as a first impression this has served me well.

                  Code:
                  // open example dataset
                  sysuse nlsw88, clear
                  
                  // compute weights
                  gen obs = !missing(union, tenure) // binary variable: 0 missing on the xs, 1 observed on the xs
                  xtile cat = wage, nq(10) // split the dependent variable up in 10 equally well filled groups
                  logit obs i.cat // how does the chance of being observed depend on the dependent variable
                  predict double w if wage < ., pr // predict chance of being observed
                  replace w = 1/w // weight = 1/chance
                  
                  // compare weighted and unweighted model
                  reg wage union tenure [pw=w]
                  reg wage union tenure
                  Dear Maarten,

                  Thank you for the explanation! So the method you recommend also is applicable for missing dependent values? My main purpose is to look at the impact of missing dependent values on my estimation. As many literature indicate that it is not a good idea to impute missing dependent variable by Multiple Imputation, so now I don't know how to do.


                  Look forward to your reply!


                  Raoping

                  Comment


                  • #10
                    The way I used weighting cannot be used to correct for missing values on the dependent variable.

                    As to multiple imputation, your reading of the literature is wrong: You should definitely impute missing values on the dependent variable. Not doing so will seriously bias your imputation results.

                    It is even more complex than that: Not including your dependent variable in the imputation model is seriously wrong. So it has to be in the imputation models, and the imputations must be used. However, if we compare a correct imputation model, the imputations on the dependent variable should not do much compared to a model that just leaves the missing values out. That has to do with the MAR assumption that underlies Multiple Imputation and the fact that missing values bias results by being dependent on the dependent variable. So if you find a difference in your MI model and a regular model, then that is due to imputing the independent variables.

                    If you worry about the impact of missing values on the dependent variable, then you have to relax the MAR assumption. Now you are in trouble. Models for that exist, e.g. heckman. However, in my taste, they are too dependent on all kinds of untestable assumptions.
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      Maarten Buis Following your example in #8, it seems the two regression models are quite similar, and we could infer that a sensitivity analysis would perform just fine (in terms of corroborating the proposed model).

                      Then, I typed afterwards:

                      Code:
                      . ttest wage, by(obs)
                      
                      Two-sample t test with equal variances
                      ------------------------------------------------------------------------------
                         Group |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
                      ---------+--------------------------------------------------------------------
                             0 |     378    8.661772    .5400464     10.4997    7.599892    9.723653
                             1 |   1,868    7.585877    .0964482    4.168528    7.396719    7.775034
                      ---------+--------------------------------------------------------------------
                      combined |   2,246    7.766949    .1214451    5.755523    7.528793    8.005105
                      ---------+--------------------------------------------------------------------
                          diff |            1.075896     .323882                .4407558    1.711035
                      ------------------------------------------------------------------------------
                          diff = mean(0) - mean(1)                                      t =   3.3219
                      Ho: diff = 0                                     degrees of freedom =     2244
                      
                          Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
                       Pr(T < t) = 0.9995         Pr(|T| > |t|) = 0.0009          Pr(T > t) = 0.0005
                      
                      * Also, on the same verge:
                      
                      . reg wage obs
                      
                            Source |       SS           df       MS      Number of obs   =     2,246
                      -------------+----------------------------------   F(1, 2244)      =     11.03
                             Model |  363.914299         1  363.914299   Prob > F        =    0.0009
                          Residual |  74004.0531     2,244  32.9786333   R-squared       =    0.0049
                      -------------+----------------------------------   Adj R-squared   =    0.0044
                             Total |  74367.9674     2,245  33.1260434   Root MSE        =    5.7427
                      
                      ------------------------------------------------------------------------------
                              wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                      -------------+----------------------------------------------------------------
                               obs |  -1.075896    .323882    -3.32   0.001    -1.711035   -.4407558
                             _cons |   8.661772   .2953728    29.32   0.000      8.08254    9.241005
                      ------------------------------------------------------------------------------
                      Considering that Yvar presented (significantly) different results according to being taken from the missing/nonmissing group, couldn't we state that the MAR assumption was violated?

                      Thanks in advance.

                      Edited to add: since the SDs between groups differ much, I also double-checked by using the Satterthwaite's method, whose p-value was 0.0505.
                      Last edited by Marcos Almeida; 05 Jun 2018, 05:43.
                      Best regards,

                      Marcos

                      Comment


                      • #12
                        MAR assumes that the chance of getting a missing values on a variable X does not depend on those unobserved values of X. The chance of getting a missing value on X may depend on observed values of other variables. This is what allows MI to correct for (some of) the bias due to missing values. So your test is not a test of MAR. In fact, MAR is by definition untestable.
                        ---------------------------------
                        Maarten L. Buis
                        University of Konstanz
                        Department of history and sociology
                        box 40
                        78457 Konstanz
                        Germany
                        http://www.maartenbuis.nl
                        ---------------------------------

                        Comment


                        • #13
                          Thank you for the clarifying reply.
                          Best regards,

                          Marcos

                          Comment


                          • #14
                            Originally posted by Maarten Buis View Post
                            The way I used weighting cannot be used to correct for missing values on the dependent variable.

                            As to multiple imputation, your reading of the literature is wrong: You should definitely impute missing values on the dependent variable. Not doing so will seriously bias your imputation results.

                            It is even more complex than that: Not including your dependent variable in the imputation model is seriously wrong. So it has to be in the imputation models, and the imputations must be used. However, if we compare a correct imputation model, the imputations on the dependent variable should not do much compared to a model that just leaves the missing values out. That has to do with the MAR assumption that underlies Multiple Imputation and the fact that missing values bias results by being dependent on the dependent variable. So if you find a difference in your MI model and a regular model, then that is due to imputing the independent variables.

                            If you worry about the impact of missing values on the dependent variable, then you have to relax the MAR assumption. Now you are in trouble. Models for that exist, e.g. heckman. However, in my taste, they are too dependent on all kinds of untestable assumptions.
                            Thank you for the detailed explanation!

                            Raoping

                            Comment

                            Working...
                            X