Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • VIF with svy

    Hi,

    I am using svy prefix to run my glm and i know vif command is not available for svy prefix.

    i found this command:
    display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

    but i am not sure if it is the right way to go so i was wondering if there is a way to get the VIF after running the svy: glm?

    thx
    Amira

  • #2
    Hi Amira!
    Try with the option uncentered, i.e. vif, uncentered
    you can also try with the command coldiag. Use the stata's help (help coldiag)

    Comment


    • #3
      Please!
      Before using the help of stata on the command coldiag you have to install that command. ssc install coldiag

      Comment


      • #4
        Hi Williams,

        Thanks a lot for your advice. I installed coldiag and ran the command on my Xs. However, I need to double check that it would be the correct way to go when some of the Xs are factor variables. Any advice?

        Comment


        • #5
          I also used coldiag2 and I think it could be also used for any type of predictor. Any thoughts?

          Comment


          • #6
            Hi!
            Yes coldiag2 seems suitable for any type of predictor. You can also try with the command perturb (ssc install perturb).

            Comment


            • #7
              Thx a million

              Comment


              • #8
                Unfortunately, neither coldiag or coldiag2 takes probability weights, so the VIFs are not correct for weighted survey data. estat vif will produce VIFs after regress, but not after svy: regress. However the only part of the survey design that affects collinearity is the weights, so you can run regress with probability weights:

                Code:
                sysuse auto, clear
                regress mpg length i.rep78 headroom [pw = turn]
                estat vif
                with result
                Code:
                . regress mpg length i.rep78 headroom [pw = turn]
                (sum of wgt is   2.7460e+03)
                
                Linear regression                               Number of obs     =         69
                                                                F(6, 62)          =      36.70
                                                                Prob > F          =     0.0000
                                                                R-squared         =     0.6912
                                                                Root MSE          =     3.3373
                
                ------------------------------------------------------------------------------
                             |               Robust
                         mpg |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                      length |  -.1926853   .0201168    -9.58   0.000    -.2328982   -.1524725
                             |
                       rep78 |
                          2  |  -.3934802   1.432463    -0.27   0.784    -3.256932    2.469972
                          3  |   -1.00596   1.146483    -0.88   0.384    -3.297747    1.285827
                          4  |  -.4478296   1.135751    -0.39   0.695    -2.718163    1.822504
                          5  |   2.394037   2.294841     1.04   0.301    -2.193286    6.981359
                             |
                    headroom |   .3067015   .4274639     0.72   0.476    -.5477867     1.16119
                       _cons |    56.8518   3.572045    15.92   0.000     49.71138    63.99221
                ------------------------------------------------------------------------------
                
                
                . estat vif
                
                    Variable |       VIF       1/VIF  
                -------------+----------------------
                      length |      1.56    0.642325
                       rep78 |
                          2  |      5.11    0.195578
                          3  |      9.75    0.102527
                          4  |      7.76    0.128898
                          5  |      5.40    0.185164
                    headroom |      1.54    0.651181
                -------------+----------------------
                    Mean VIF |      5.19
                I don't find the VIFs for the individual factor variables to be very informative.
                Last edited by Steve Samuels; 20 Aug 2015, 22:16.
                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment


                • #9
                  Thanks Steven,

                  As far as I know and I am not an expert for sure is that perturb could be used for any kind of predictors. However, I am not sure of its full validity, any thoughts?

                  Also, when I tried to use perturb I could not really manage to find the right syntax for it. If you have tried it before, would be grateful if you could let me know the exact syntax.

                  I tried to regress then vif as you recommended but since I have most of my predictors as categorical I am not sure if it is right way to go!! the results showed in the beginning that 2 of my factor vars were omitted because of collinearity. So should I remove those 2 variables in future analysis?
                  Also, the mean vif was 3.13. I am not sure if this is good or bad in terms of collinearity presence!!!

                  Originally, my model is a glm not regress and vif did not work after glm.

                  So any help about how to proceed, I am getting confused!!

                  Thanks again

                  Comment


                  • #10
                    Here is an example of working perturb code. Continuous variables are listed in the prange() option; categorical variables in the pfac() option. The trick is use i.variable specification only for categorical variables with > 2 levels. Notice "foreign" in the example below.

                    Code:
                    sysuse auto, clear
                    perturb: xi: reg  length turn headroom i.rep78 foreign, ///
                    poptions(pvars(turn headroom) prange(5 5) pfac(rep78 foreign) pcnt(96 96) )

                    coldiag and coldiag2 are appropriate only for least squares regression, because they operate on the X'X matrix. For other glm models, the corresponding matrix will be different. For example, for maximum likelihood, it will be the information matrix; for vce(robust) it will be the (inverse of) the Huber-White variance-covariance matrix. (Hill, Carter, and Adkins, 2003).

                    For vif and condition-number approaches, one must treat dummy variables as a group. Fox and Monette (1992) have some proposals.


                    References:

                    Hill, R Carter, and Lee C Adkins. 2003. Chapter 12: Collinearity. In A Companion to Theoretical Econometrics, ed. BH Baltagi, 256-278. Oxford: Blackwell Publishing.

                    Fox, John, and Georges Monette. 1992. Generalized collinearity diagnostics. Journal of the American Statistical Association 87, no. 417: 178-183.

                    Steve Samuels
                    Statistical Consulting
                    [email protected]

                    Stata 14.2

                    Comment


                    • #11
                      I didn't answer your first question about the utility of perturb. The equivalence of a formal perturbation ("local influence") measure and VIF is proved by Schall and Dune (1992). perturb doesn't calculate a formal perturbation measure, but it is based on the same notion.

                      John Hendrickx, the author of coldiag2 stated in the help that it could be used for models other than regression, as the general problem is extreme multiple correlation of the predictors. He did not state that it could be used for categorical predictors. To quote the help
                      perturb can be used with estimation procedures other than regress. On the
                      other hand, collinearity is a result of extreme (multiple) correlation among
                      independent variables. Collinearity could therefore be diagnosed by running
                      regress with an arbitrary dependent variable to use perturb, vif and/or collin
                      to assess collinearity. This will certainly be a faster solution since maximum
                      likelihood procedures require iterative solutions whereas ols regression does
                      not. It is possible though that ML procedures are more sensitive to
                      collinearity, in which case perturb would be the preferred solution.


                      Reference:
                      Schall, Robert, and Timothy T Dunne. 1992. A note on the relationship between parameter collinearity and local influence. Biometrika 79, no. 2: 399-404.

                      Steve Samuels
                      Statistical Consulting
                      [email protected]

                      Stata 14.2

                      Comment


                      • #12
                        Thanks William for the code and the references.

                        I tried to run perturb for my variables but I have 31 variables in my model so it keeps giving me error about the variables for example 'unrecognized command: x1 invalid command name'. Do you think perturb does not handle that many variables?

                        Comment


                        • #13
                          Sorry, I don't know the limits of perturb. How do you know that your model needs 31 predictors? It looks like you should do some variable selection before you go further. And before you post any more, read and absorb http://www.statalist.org/forums/help#stata ,especially the section on posting Stata code.
                          Steve Samuels
                          Statistical Consulting
                          [email protected]

                          Stata 14.2

                          Comment


                          • #14
                            Trying perturb with all variables there is an error:

                            Code:
                             perturb: xi: reg  KS4_PTSTNEWG W2BenTotAnAm W2GrssyrHH KS4_CVAP3APS KS4_AGE_START i.W2FeFinMP0c i.W2hiqualgmum i.W2hiqualgdad i.W2nssecmum 
                            > i.W2nssecdad i.W2condur6MP i.W2condur5MP i.W2Hous12HH i.W2famtyp i.W1ethgrpYP i.W1sexYP i.W2senMP i.W1relig1YP KS4_IDACI i.urbind i.IndScho
                            > ol i.W1kidskolMP i.W2schlifMP i.phaseofEdu i.LM6 i.ELMPC1recoded i.QP5 i.OE3 i.LM5 i.ESELEW4recoded i.PDW7 i.PDW10 i.PDW9 i.AS7, poptions(p
                            > vars(W2BenTotAnAm W2GrssyrHH KS4_CVAP3APS KS4_AGE_START KS4_IDACI) prange(1 1 1 1 1) pfac(W2FeFinMP0c W2hiqualgmum W2hiqualgdad W2nssecmum 
                            > W2nssecdad W2condur6MP W2condur5MP W2Hous12HH W2famtyp W1ethgrpYP W1sexYP W2senMP W1relig1YP urbind IndSchool W1kidskolMP W2schlifMP phaseo
                            > fEdu LM6 ELMPC1recoded QP5 OE3 LM5 ESELEW4recoded PDW7 PDW10 PDW9 AS7) pcnt(95 95 95 95 95 95 95 95 95  95 95 95 95 95 95 95 95 95 95 95 95
                            >  95 95 95 95 95) )
                            unrecognized command:  i.phaseofEdu invalid command name
                            r(199);
                            and Just to spread the info, I have contacted John Hendickx and that was his reply:

                            Hi Amira,

                            I'm afraid I haven't had access to Stata for years so I can't really shed any light on what the problem might be. If you're familiar with R, then you might try the perturb version in R, available through CRAN.

                            What you can also try is a simpler model with less variables. If the perturb macro works then try a model with more variables. I think a model with 31 categorical variables is a lot, you'll need a lot of data to estimate them all with a reasonable amount of accuracy.

                            Good luck,
                            John Hendrickx
                            and i replied back to him that I did try perturb with just a couple of variables and it worked:

                            Code:
                            perturb: xi: reg  KS4_PTSTNEWG KS4_CVAP3APS W1ethgrpYP, poptions(pvars(KS4_CVAP3APS) prange(1) pfac(W1ethgrpYP) pcnt(95) )
                            
                                  Source |       SS       df       MS              Number of obs =   14827
                            -------------+------------------------------           F(  2, 14824) =13100.50
                                   Model |   235861927     2   117930964           Prob > F      =  0.0000
                                Residual |   133445995 14824  9002.02338           R-squared     =  0.6387
                            -------------+------------------------------           Adj R-squared =  0.6386
                                   Total |   369307922 14826  24909.4781           Root MSE      =  94.879
                            
                            ------------------------------------------------------------------------------
                            KS4_PTSTNEWG |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                            -------------+----------------------------------------------------------------
                            KS4_CVAP3APS |   18.85981   .1165187   161.86   0.000     18.63141     19.0882
                              W1ethgrpYP |    9.51025   .3953632    24.05   0.000     8.735289    10.28521
                                   _cons |  -294.1362   4.195203   -70.11   0.000    -302.3593   -285.9131
                            ------------------------------------------------------------------------------
                            
                            Perturb variables:
                            --------------------------------------------------
                            KS4_CVAP3APS                      normal(0,1)
                            
                            Perturb factors:
                            --------------------------------------------------
                            
                            Reclassification probabilities for W1ethgrpYP:
                            
                            -------------------------------------------------------------------------
                            original  |
                            variable  |
                            W1ethgrpY |                     reclassifed variable                     
                            P         |     1      2      3      4      5      6      7      8  Total
                            ----------+--------------------------------------------------------------
                                    1 | 0.950  0.007  0.007  0.007  0.007  0.007  0.007  0.007  1.000
                                    2 | 0.007  0.950  0.007  0.007  0.007  0.007  0.007  0.007  1.000
                                    3 | 0.007  0.007  0.950  0.007  0.007  0.007  0.007  0.007  1.000
                                    4 | 0.007  0.007  0.007  0.950  0.007  0.007  0.007  0.007  1.000
                                    5 | 0.007  0.007  0.007  0.007  0.950  0.007  0.007  0.007  1.000
                                    6 | 0.007  0.007  0.007  0.007  0.007  0.950  0.007  0.007  1.000
                                    7 | 0.007  0.007  0.007  0.007  0.007  0.007  0.950  0.007  1.000
                                    8 | 0.007  0.007  0.007  0.007  0.007  0.007  0.007  0.950  1.000
                            -------------------------------------------------------------------------
                            
                            Initial expected table based on the reclassification probabilities:
                            
                            ----------------------------------------------------------------------------------------------------
                            original  |
                            variable  |
                            W1ethgrpY |                                   reclassifed variable                                  
                            P         |        1         2         3         4         5         6         7         8     Total
                            ----------+-----------------------------------------------------------------------------------------
                                    1 |  1.0e+04    75.393    75.393    75.393    75.393    75.393    75.393    75.393   1.1e+04
                                    2 |    5.821   774.250     5.821     5.821     5.821     5.821     5.821     5.821   815.000
                                    3 |    7.279     7.279   968.050     7.279     7.279     7.279     7.279     7.279  1019.000
                                    4 |    6.879     6.879     6.879   914.850     6.879     6.879     6.879     6.879   963.000
                                    5 |    5.307     5.307     5.307     5.307   705.850     5.307     5.307     5.307   743.000
                                    6 |    4.257     4.257     4.257     4.257     4.257   566.200     4.257     4.257   596.000
                                    7 |    4.457     4.457     4.457     4.457     4.457     4.457   592.800     4.457   624.000
                                    8 |    3.064     3.064     3.064     3.064     3.064     3.064     3.064   407.550   429.000
                                      | 
                                Total |  1.0e+04   880.886  1073.229  1020.429   813.000   674.400   700.800   516.943   1.6e+04
                            ----------------------------------------------------------------------------------------------------
                            
                            The reclassification probabilities will be adjusted to let
                            the expected frequencies of the reclassified variable be equal to those of W1ethgrpYP
                            
                            The expected table will be quasi-independent
                            ln(q)=:         4.890
                            
                            Adjusted expected table:
                            
                            ----------------------------------------------------------------------------------------------------
                            original  |
                            variable  |
                            W1ethgrpY |                                   reclassifed variable                                  
                            P         |        1         2         3         4         5         6         7         8     Total
                            ----------+-----------------------------------------------------------------------------------------
                                    1 |  1.0e+04    21.178    23.775    23.090    20.185    17.996    18.432    15.148   1.1e+04
                                    2 |   21.178   761.741     6.430     6.244     5.459     4.867     4.985     4.097   815.000
                                    3 |   23.775     6.430   959.999     7.010     6.128     5.463     5.596     4.599  1019.000
                                    4 |   23.090     6.244     7.010   905.496     5.952     5.306     5.435     4.467   963.000
                                    5 |   20.185     5.459     6.128     5.952   691.982     4.639     4.751     3.905   743.000
                                    6 |   17.996     4.867     5.463     5.306     4.639   550.013     4.236     3.481   596.000
                                    7 |   18.432     4.985     5.596     5.435     4.751     4.236   577.000     3.566   624.000
                                    8 |   15.148     4.097     4.599     4.467     3.905     3.481     3.566   389.738   429.000
                                      | 
                                Total |  1.1e+04   815.000  1019.000   963.000   743.000   596.000   624.000   429.000   1.6e+04
                            ----------------------------------------------------------------------------------------------------
                            
                            Final reclassification probabilities:
                            
                            -------------------------------------------------------------------------
                            original  |
                            variable  |
                            W1ethgrpY |                     reclassifed variable                     
                            P         |     1      2      3      4      5      6      7      8  Total
                            ----------+--------------------------------------------------------------
                                    1 | 0.987  0.002  0.002  0.002  0.002  0.002  0.002  0.001  1.000
                                    2 | 0.026  0.935  0.008  0.008  0.007  0.006  0.006  0.005  1.000
                                    3 | 0.023  0.006  0.942  0.007  0.006  0.005  0.005  0.005  1.000
                                    4 | 0.024  0.006  0.007  0.940  0.006  0.006  0.006  0.005  1.000
                                    5 | 0.027  0.007  0.008  0.008  0.931  0.006  0.006  0.005  1.000
                                    6 | 0.030  0.008  0.009  0.009  0.008  0.923  0.007  0.006  1.000
                                    7 | 0.030  0.008  0.009  0.009  0.008  0.007  0.925  0.006  1.000
                                    8 | 0.035  0.010  0.011  0.010  0.009  0.008  0.008  0.908  1.000
                            -------------------------------------------------------------------------
                            
                            Impact of perturbations on coefficients after 100 iterations:
                            
                                                                                             variable |     mean       sd      min      max
                            --------------------------------------------------------------------------+------------------------------------
                             KS3 average point score (using fine grading) for contextual value added. |   18.356    0.059   18.274   18.860
                                                         W1 DV: Young person's ethnic group (grouped) |    8.608    0.160    8.244    9.510
                                                                                                _cons | -274.346    2.275 -294.136 -271.772
                            ---------------------------------------------------------------------------------------------------------------

                            So maybe I am doing something wrong or perturb does not handle that many variables??????

                            Comment


                            • #15
                              your last "test" is not really a test because you have no "i." variables; maybe it has to be
                              Code:
                              xi: perturb: reg ...
                              but that's just a guess

                              on the other hand, the help for perturb shows an example using xi3 so why not download that and use it?

                              Comment

                              Working...
                              X