Hypothesis Test on Goodman-Kruskal Gamma Statistic

Josephine Auer

Join Date: Apr 2020

Posts: 19
#1

Hypothesis Test on Goodman-Kruskal Gamma Statistic

28 Oct 2020, 20:17

Hi---looking for some help with whether there is an option to run a hypothesis test/compute a p-value for the Goodman-Kruskal Gamma statistic following a two-way tabulate command like follows:

tabulate size `var', gamma

Thanks!
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30096
#2

28 Oct 2020, 21:03

-tabulate- leaves behind r(gamma) and r(ase_gam) (the standard error of gamma). Their ratio will have a standard normal distribution (asymptotically for large N).

Code:

tabulate size `var', gamma local z = `r(gamma)'/`r(ase_gam)' local p = 2*normal(-abs(`z')) display "p-value = " as result %05.3f `p'
Comment
Josephine Auer

Join Date: Apr 2020

Posts: 19
#3

29 Oct 2020, 21:37

Hi Clyde, thanks so much for this that's really helpful! Do we have to divide the "ase_gam" in the denominator of "local z" by the $$\sqrt{n}$$ (square root of sample size) to create the standard score given that gamma is a test statistic?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30096
#4

29 Oct 2020, 22:17

No. "ase" stands for asymptotic standard error. It is already on the correct scale, no square root of n needed.
Comment
Josephine Auer

Join Date: Apr 2020

Posts: 19
#5

31 Oct 2020, 00:19

Thanks Clyde that's really helpful
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35696

31 Oct 2020, 03:37

Clyde Schechter is right but I would not put too much trust in an asymptotic standard error, any more than he would. First as usual, it is better to do calculations for the finite sample size you have. Second, gamma is bounded, so being a little more subtle about the uncertainty costs little and may avoid absurdity.

This was just the first example I tried, and note how mainstream procedure implies an upper limit above 1 for gamma. Better to spend your confidence money where the probability is.

Code:

. sysuse auto, clear
(1978 Automobile Data)

.
. bootstrap r(gamma), reps(10000) nodots : tab for rep78, gamma

warning: Because tabulate is not an estimation command or does not set e(sample), bootstrap has no way to determine which
         observations are used in calculating the statistics and so assumes that all observations are used. This means that no
         observations will be excluded from the resampling because of missing values or other reasons.

         If the assumption is not true, press Break, save the data, and drop the observations that are to be excluded. Be sure
         that the dataset in memory contains only the relevant data.

Bootstrap results                               Number of obs     =         74
                                                Replications      =     10,000

      command:  tabulate for rep78, gamma
        _bs_1:  r(gamma)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .8768116   .0673485    13.02   0.000     .7448109    1.008812
------------------------------------------------------------------------------

.
. estat bootstrap, percentile

Bootstrap results                               Number of obs     =         74
                                                Replications      =      10000

      command:  tabulate for rep78, gamma
        _bs_1:  r(gamma)

------------------------------------------------------------------------------
             |    Observed               Bootstrap
             |       Coef.       Bias    Std. Err.  [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .87681159  -.0036421   .06734853    .7142857   .9766262   (P)
------------------------------------------------------------------------------
(P)    percentile confidence interval

.
end of do-file

. tab for rep78, gamma

           |                   Repair Record 1978
  Car type |         1          2          3          4          5 |     Total
-----------+-------------------------------------------------------+----------
  Domestic |         2          8         27          9          2 |        48
   Foreign |         0          0          3          9          9 |        21
-----------+-------------------------------------------------------+----------
     Total |         2          8         30         18         11 |        69

                    gamma =   0.8768  ASE = 0.064

Executive summary. Use bootstrapping to get a confidence interval. Then if you wish, check to see whether your hypothesised value is inside.

Last edited by Nick Cox; 31 Oct 2020, 04:05.

Comment

Mike Lacy

Join Date: Apr 2014
Posts: 2416

31 Oct 2020, 08:33

I have several scholarly (i.e., pedantic) comments here, motivated by my coming from the discipline (sociology) where Gamma gained its one-time popularity:

1) While I think that randomization approaches (-bootstrap-, -permute-) are generally preferable, I'd defend the asymptotic performance of gamma. The large sample value of gamma in Nick's example (near the upper limit) presents a tough situation. In my experience, asymptotic approaches to gamma give surprisingly good asymptotic CI performance. (My recollection is that asymptotic normality here is known to depend on the number of discordant and discordant pairs, which is generally much larger than the sample size.)

2) Gamma has a different sampling distribution under the null vs. the alternative, and there is a different formula for the standard error of gamma for an hypothesis test vs. for a confidence interval. Stata implements only the asymptotic SE formula for a CI, which will often not give a great result for a p-value.

3) Presuming a randomization approach, if an hypothesis test rather than a CI is desired, we should choose -permute-, which simulates presuming the null hypothesis is true, as opposed to -bootstrap-, which presumes the alternative hypothesis is true.

4) In any event, I'd almost always choose Somers' D over Gamma (-ssc describe somersd-), which was developed as a version of Gamma that explicitly recognizes one variable is explanatory variable and the other the response. And, per Roger Newson's publications cited in -somersd-, it has a number of interesting connections to other more popular statistics.

Here's an illustration of some of the points above, using the auto data modified to give a more typical sample value of Gamma.

Code:

. clear
. set seed 3759
. sysuse auto
(1978 Automobile Data)

. tab rep78 foreign, gamma

    Repair |
    Record |       Car type
      1978 |  Domestic    Foreign |     Total
-----------+----------------------+----------
         1 |         2          0 |         2
         2 |         8          0 |         8
         3 |        27          3 |        30
         4 |         9          9 |        18
         5 |         2          9 |        11
-----------+----------------------+----------
     Total |        48         21 |        69

                    gamma =   0.8768  ASE = 0.064

. // Change data to give a more typical value of Gamma
. replace foreign =!foreign if (runiform() > 0.7)
(28 real changes made)

. tab rep78 foreign, gamma

    Repair |
    Record |       Car type
      1978 |  Domestic    Foreign |     Total
-----------+----------------------+----------
         1 |         2          0 |         2
         2 |         5          3 |         8
         3 |        17         13 |        30
         4 |         9          9 |        18
         5 |         3          8 |        11
-----------+----------------------+----------
     Total |        36         33 |        69

                    gamma =   0.3625  ASE = 0.169

. local gamma = r(gamma)
. local se1 = r(ase_gam)
. local width = invnormal(0.975) * `se1'

. di "By formula: LCL = " r(gamma) - `width' ", UCL = " r(gamma) + `width'
By formula: LCL = .03215211, UCL = .69275934

. quiet bootstrap g = r(gamma), saving(`temp') reps(10000) bca nodots: ///
>    tab rep78 foreign, gamma

. di "By bootstrap: LCL = " el(r(table), 5, 1) ", UCL = "el(r(table), 6, 1)
By bootstrap: LCL = .03441267, UCL = .69049878

. //
. tempfile temp
. quiet permute foreign g = r(gamma), right saving(`temp') reps(10000) nodots: ///
>    tab rep78 foreign, gamma

. di "p from permute = " el(r(p), 1,1)  
p from permute = .0223

. di "p from CI ASE = " 1-normal(`gamma'/`se1')
p from CI ASE = .01574801

. quiet use `temp', clear
. quiet summ g
. di "bootstrap SE of gamma = " %6.4f `se1' ", permute SE of gamma = " %6.4f r(sd)
bootstrap SE of gamma = 0.1685, permute SE of gamma = 0.1839

Announcement