  • Missing rho test in svy, subpop(): heckprobit

    Dear All,

    I am running a probit regression with sample selection. Here, the parameter rho is supposed to indicate the presence of sample selection. As my data has a complex survey design, I am using the svy prefix with subpop() option.

    However, when I run the model, I am not getting the Wald test results for rho. I understand lrtest can't be performed in the presence of weights, but not sure why a Wald test result is not available as well.

    For instance when I run the following (hypothetical example):
    webuse nhanes2f
    svyset psuid [pweight=finalwgt], strata(stratid)
    heckprob heartatk black zinc age age2 weight, select(rural = female  orace )
    heckprob heartatk black zinc age age2 weight, select(rural = female  orace ) vce(robust)
    In both, instances, I got either the likelihood test or Wald test results in the end.

    But when I run:
    svy: heckprob heartatk black zinc age age2 weight, select(rural = female  orace )
    svy, subpop(black ): heckprob heartatk zinc age age2 weight, select(rural = female  orace )
    In neither case, I am getting the Wald test results. I tried testparm _b[rho] as well but turns out it is stored as a scalar.

    Please kindly assist. Thank you in advance.

  • #2
    See Jeff Pitblado (StataCorp)'s reply #17 from the following thread on statistics based on the fitted log-likelihood in -svy- estimations: His recommendation is that in the absence of stratification, you can use -pweights- instead of -svy- estimation, clustering on the PSU variable. So in your example, ignoring that we have stratification and given that "psuid" is your PSU variable (should have more than 30 levels), something like:

    webuse nhanes2f, clear
    set seed 03152022
    replace psuid= runiformint(1, 200)
    svyset psuid [pweight=finalwgt]
    svy: heckprob heartatk black zinc age age2 weight, select(rural = female orace)
    heckprob heartatk black zinc age age2 weight [pweight=finalwgt], select(rural = female orace) vce(cluster psuid) nolog

     svy: heckprob heartatk black zinc age age2 weight, select(rural = female orace)
    (running heckprob on estimation sample)
    Survey: Probit model with sample selection
    Number of strata   =         1                Number of obs     =        9,957
    Number of PSUs     =       200                Population size   =  113,438,880
                                                  Design df         =          199
                                                  F(   5,    195)   =         1.53
                                                  Prob > F          =       0.1820
                 |             Linearized
                 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    heartatk     |
           black |  -.0627992    .049964    -1.26   0.210    -.1613261    .0357276
            zinc |   .0003236   .0007469     0.43   0.665    -.0011492    .0017964
             age |   .0298621   .0173487     1.72   0.087    -.0043488     .064073
            age2 |   -.000204   .0001283    -1.59   0.113    -.0004571    .0000491
          weight |   .0003338   .0008906     0.37   0.708    -.0014223      .00209
           _cons |  -.5114438   .6924987    -0.74   0.461    -1.877021    .8541336
    rural        |
          female |   -.097242   .0297931    -3.26   0.001    -.1559927   -.0384912
           orace |  -.5042324   .1298253    -3.88   0.000    -.7602422   -.2482226
           _cons |   -.475335   .0207137   -22.95   0.000    -.5161816   -.4344884
         /athrho |  -3.105201   .7721744    -4.02   0.000    -4.627895   -1.582506
             rho |  -.9959912   .0061786                     -.9998089   -.9189924
    . heckprob heartatk black zinc age age2 weight [pweight=finalwgt], select(rural = female orace) vce(cluster psuid) nolog
    Probit model with sample selection              Number of obs     =      9,957
                                                          Selected    =      3,417
                                                          Nonselected =      6,540
                                                    Wald chi2(5)      =       7.81
    Log pseudolikelihood = -7.28e+07                Prob > chi2       =     0.1671
                                    (Std. Err. adjusted for 200 clusters in psuid)
                 |               Robust
                 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
    heartatk     |
           black |  -.0627992    .049964    -1.26   0.209    -.1607269    .0351284
            zinc |   .0003236   .0007469     0.43   0.665    -.0011403    .0017875
             age |   .0298621   .0173487     1.72   0.085    -.0041407     .063865
            age2 |   -.000204   .0001283    -1.59   0.112    -.0004556    .0000475
          weight |   .0003338   .0008906     0.37   0.708    -.0014116    .0020793
           _cons |  -.5114438   .6924987    -0.74   0.460    -1.868716    .8458288
    rural        |
          female |   -.097242   .0297931    -3.26   0.001    -.1556354   -.0388485
           orace |  -.5042324   .1298253    -3.88   0.000    -.7586853   -.2497795
           _cons |   -.475335   .0207137   -22.95   0.000    -.5159331   -.4347368
         /athrho |  -3.105201   .7721744    -4.02   0.000    -4.618635   -1.591766
             rho |  -.9959912   .0061786                     -.9998053   -.9204197
    Wald test of indep. eqns. (rho = 0): chi2(1) =    16.17   Prob > chi2 = 0.0001
    • #3
      Thank you so much. So in the presence of stratification, I will just have to accept that as it is? Is there anyway to test rho manually, given I have coefficient, se and CI interval?


      • #4
        My understanding is that such statistics which depend on the log-likelihood are invalid with -svy- estimation.


        • #5
          Originally posted by Andrew Musau View Post
          My understanding is that such statistics which depend on the log-likelihood are invalid with -svy- estimation.
          Thanks a lot for the help!


          • #6
            I found the following Stata FAQ that gives more details: Likelihood-ratio tests are invalid both with p-weighted data and -svy- estimation. Wald tests are fine with both. With -svy- estimation, you get an adjusted Wald test where the adjustment is needed if the total number of clusters is small \((\lessapprox100)\). So just test whether the coefficient on /athro is equal to zero. Note that /athro is just a transformation of rho.

            webuse nhanes2f, clear
            set seed 03152022
            replace psuid= runiformint(1, 95)
            svyset psuid [pweight=finalwgt]
            heckprob heartatk black zinc age age2 weight [pweight=finalwgt], select(rural = female orace) vce(cluster psuid) nolog
            test  _b[/athrho]=0
            svy: heckprob heartatk black zinc age age2 weight, select(rural = female orace)
            test  _b[/athrho]=0

            . heckprob heartatk black zinc age age2 weight [pweight=finalwgt], select(rural = female orace) vce(cluster psuid) nolog
            Probit model with sample selection              Number of obs     =      9,957
                                                                  Selected    =      3,417
                                                                  Nonselected =      6,540
                                                            Wald chi2(5)      =       6.64
            Log pseudolikelihood = -7.28e+07                Prob > chi2       =     0.2491
                                             (Std. Err. adjusted for 95 clusters in psuid)
                         |               Robust
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            heartatk     |
                   black |  -.0627992   .0431665    -1.45   0.146    -.1474041    .0218056
                    zinc |   .0003236   .0007356     0.44   0.660    -.0011182    .0017654
                     age |   .0298621   .0168563     1.77   0.076    -.0031757    .0628999
                    age2 |   -.000204   .0001258    -1.62   0.105    -.0004506    .0000426
                  weight |   .0003338   .0008814     0.38   0.705    -.0013936    .0020613
                   _cons |  -.5114438   .6634953    -0.77   0.441    -1.811871    .7889831
            rural        |
                  female |   -.097242   .0311652    -3.12   0.002    -.1583246   -.0361593
                   orace |  -.5042324   .1375608    -3.67   0.000    -.7738466   -.2346181
                   _cons |   -.475335    .021763   -21.84   0.000    -.5179897   -.4326803
                 /athrho |  -3.105201   .7593972    -4.09   0.000    -4.593592   -1.616809
                     rho |  -.9959912   .0060764                     -.9997953     -.92416
            Wald test of indep. eqns. (rho = 0): chi2(1) =    16.72   Prob > chi2 = 0.0000
            . test  _b[/athrho]=0
             ( 1)  [/]athrho = 0
                       chi2(  1) =   16.72
                     Prob > chi2 =    0.0000
            . svy: heckprob heartatk black zinc age age2 weight, select(rural = female orace)
            (running heckprob on estimation sample)
            Survey: Probit model with sample selection
            Number of strata   =         1                Number of obs     =        9,957
            Number of PSUs     =        95                Population size   =  113,438,880
                                                          Design df         =           94
                                                          F(   5,     90)   =         1.27
                                                          Prob > F          =       0.2835
                         |             Linearized
                         |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
            heartatk     |
                   black |  -.0627992   .0431665    -1.45   0.149    -.1485074    .0229089
                    zinc |   .0003236   .0007356     0.44   0.661     -.001137    .0017842
                     age |   .0298621   .0168563     1.77   0.080    -.0036065    .0633308
                    age2 |   -.000204   .0001258    -1.62   0.108    -.0004538    .0000458
                  weight |   .0003338   .0008814     0.38   0.706    -.0014162    .0020838
                   _cons |  -.5114438   .6634953    -0.77   0.443    -1.828829    .8059417
            rural        |
                  female |   -.097242   .0311652    -3.12   0.002    -.1591212   -.0353628
                   orace |  -.5042324   .1375608    -3.67   0.000    -.7773626   -.2311022
                   _cons |   -.475335    .021763   -21.84   0.000     -.518546    -.432124
                 /athrho |  -3.105201   .7593972    -4.09   0.000    -4.613001     -1.5974
                     rho |  -.9959912   .0060764                     -.9998031   -.9212762
            . test  _b[/athrho]=0
            Adjusted Wald test
             ( 1)  [/]athrho = 0
                   F(  1,    94) =   16.72
                        Prob > F =    0.0001
            • #7
              This was incredibly helpful. Thank you so very much!

