Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hypothesis Test on Goodman-Kruskal Gamma Statistic

    Hi---looking for some help with whether there is an option to run a hypothesis test/compute a p-value for the Goodman-Kruskal Gamma statistic following a two-way tabulate command like follows:

    tabulate size `var', gamma

    Thanks!

  • #2
    -tabulate- leaves behind r(gamma) and r(ase_gam) (the standard error of gamma). Their ratio will have a standard normal distribution (asymptotically for large N).

    Code:
    tabulate size `var', gamma
    local z = `r(gamma)'/`r(ase_gam)'
    local p = 2*normal(-abs(`z'))
    display "p-value = " as result %05.3f `p'

    Comment


    • #3
      Hi Clyde, thanks so much for this that's really helpful! Do we have to divide the "ase_gam" in the denominator of "local z" by the $$\sqrt{n}$$ (square root of sample size) to create the standard score given that gamma is a test statistic?

      Comment


      • #4
        No. "ase" stands for asymptotic standard error. It is already on the correct scale, no square root of n needed.

        Comment


        • #5
          Thanks Clyde that's really helpful

          Comment


          • #6
            Clyde Schechter is right but I would not put too much trust in an asymptotic standard error, any more than he would. First as usual, it is better to do calculations for the finite sample size you have. Second, gamma is bounded, so being a little more subtle about the uncertainty costs little and may avoid absurdity.

            This was just the first example I tried, and note how mainstream procedure implies an upper limit above 1 for gamma. Better to spend your confidence money where the probability is.


            Code:
            . sysuse auto, clear
            (1978 Automobile Data)
            
            .
            . bootstrap r(gamma), reps(10000) nodots : tab for rep78, gamma
            
            warning: Because tabulate is not an estimation command or does not set e(sample), bootstrap has no way to determine which
                     observations are used in calculating the statistics and so assumes that all observations are used. This means that no
                     observations will be excluded from the resampling because of missing values or other reasons.
            
                     If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure
                     that the dataset in memory contains only the relevant data.
            
            Bootstrap results                               Number of obs     =         74
                                                            Replications      =     10,000
            
                  command:  tabulate for rep78, gamma
                    _bs_1:  r(gamma)
            
            ------------------------------------------------------------------------------
                         |   Observed   Bootstrap                         Normal-based
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _bs_1 |   .8768116   .0673485    13.02   0.000     .7448109    1.008812
            ------------------------------------------------------------------------------
            
            .
            . estat bootstrap, percentile
            
            Bootstrap results                               Number of obs     =         74
                                                            Replications      =      10000
            
                  command:  tabulate for rep78, gamma
                    _bs_1:  r(gamma)
            
            ------------------------------------------------------------------------------
                         |    Observed               Bootstrap
                         |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                   _bs_1 |   .87681159  -.0036421   .06734853    .7142857   .9766262   (P)
            ------------------------------------------------------------------------------
            (P)    percentile confidence interval
            
            .
            end of do-file
            
            . tab for rep78, gamma
            
                       |                   Repair Record 1978
              Car type |         1          2          3          4          5 |     Total
            -----------+-------------------------------------------------------+----------
              Domestic |         2          8         27          9          2 |        48
               Foreign |         0          0          3          9          9 |        21
            -----------+-------------------------------------------------------+----------
                 Total |         2          8         30         18         11 |        69
            
                                gamma =   0.8768  ASE = 0.064
            Executive summary. Use bootstrapping to get a confidence interval. Then if you wish, check to see whether your hypothesised value is inside.
            Last edited by Nick Cox; 31 Oct 2020, 04:05.

            Comment


            • #7
              I have several scholarly (i.e., pedantic) comments here, motivated by my coming from the discipline (sociology) where Gamma gained its one-time popularity:

              1) While I think that randomization approaches (-bootstrap-, -permute-) are generally preferable, I'd defend the asymptotic performance of gamma. The large sample value of gamma in Nick's example (near the upper limit) presents a tough situation. In my experience, asymptotic approaches to gamma give surprisingly good asymptotic CI performance. (My recollection is that asymptotic normality here is known to depend on the number of discordant and discordant pairs, which is generally much larger than the sample size.)

              2) Gamma has a different sampling distribution under the null vs. the alternative, and there is a different formula for the standard error of gamma for an hypothesis test vs. for a confidence interval. Stata implements only the asymptotic SE formula for a CI, which will often not give a great result for a p-value.

              3) Presuming a randomization approach, if an hypothesis test rather than a CI is desired, we should choose -permute-, which simulates presuming the null hypothesis is true, as opposed to -bootstrap-, which presumes the alternative hypothesis is true.

              4) In any event, I'd almost always choose Somers' D over Gamma (-ssc describe somersd-), which was developed as a version of Gamma that explicitly recognizes one variable is explanatory variable and the other the response. And, per Roger Newson's publications cited in -somersd-, it has a number of interesting connections to other more popular statistics.

              Here's an illustration of some of the points above, using the auto data modified to give a more typical sample value of Gamma.
              Code:
              . clear
              . set seed 3759
              . sysuse auto
              (1978 Automobile Data)
              
              . tab rep78 foreign, gamma
              
                  Repair |
                  Record |       Car type
                    1978 |  Domestic    Foreign |     Total
              -----------+----------------------+----------
                       1 |         2          0 |         2
                       2 |         8          0 |         8
                       3 |        27          3 |        30
                       4 |         9          9 |        18
                       5 |         2          9 |        11
              -----------+----------------------+----------
                   Total |        48         21 |        69
              
                                  gamma =   0.8768  ASE = 0.064
              
              . // Change data to give a more typical value of Gamma
              . replace foreign =!foreign if (runiform() > 0.7)
              (28 real changes made)
              
              . tab rep78 foreign, gamma
              
                  Repair |
                  Record |       Car type
                    1978 |  Domestic    Foreign |     Total
              -----------+----------------------+----------
                       1 |         2          0 |         2
                       2 |         5          3 |         8
                       3 |        17         13 |        30
                       4 |         9          9 |        18
                       5 |         3          8 |        11
              -----------+----------------------+----------
                   Total |        36         33 |        69
              
                                  gamma =   0.3625  ASE = 0.169
              
              . local gamma = r(gamma)
              . local se1 = r(ase_gam)
              . local width = invnormal(0.975) * `se1'
              
              . di "By formula: LCL = " r(gamma) - `width' ", UCL = " r(gamma) + `width'
              By formula: LCL = .03215211, UCL = .69275934
              
              . quiet bootstrap g = r(gamma), saving(`temp') reps(10000) bca nodots: ///
              >    tab rep78 foreign, gamma
              
              . di "By bootstrap: LCL = " el(r(table), 5, 1) ", UCL = "el(r(table), 6, 1)
              By bootstrap: LCL = .03441267, UCL = .69049878
              
              . //
              . tempfile temp
              . quiet permute foreign g = r(gamma), right saving(`temp') reps(10000) nodots: ///
              >    tab rep78 foreign, gamma
              
              . di "p from permute = " el(r(p), 1,1)  
              p from permute = .0223
              
              . di "p from CI ASE = " 1-normal(`gamma'/`se1')
              p from CI ASE = .01574801
              
              . quiet use `temp', clear
              . quiet summ g
              . di "bootstrap SE of gamma = " %6.4f `se1' ", permute SE of gamma = " %6.4f r(sd)
              bootstrap SE of gamma = 0.1685, permute SE of gamma = 0.1839

              Comment

              Working...
              X