Hoeffding’s D correlation

Tommaso Salvitti

Join Date: Nov 2019

Posts: 165
#1

Hoeffding’s D correlation

30 Oct 2024, 05:22

Hi to everybody I need to calculate the Hoeffding correlation coefficient D. I couldn't find much in the literature so I kindly ask you for a comparison. The objective is to see how the correlation is between variable A and variable B. Is the following syntax correct?

HTML Code:

somersd A B

A and B are not binary variables; A and B are numeric variable

Thanks in advance

Tommaso
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35436
#2

30 Oct 2024, 05:55

This is a puzzling question. Hoeffding's d and Somers' d are, as I understand it, different beasts that happen to share notation.

A highly prejudiced view is that such measures are interesting programming challenges for those so inclined but mostly a distraction. They put numbers on strength of relationship without otherwise saying anything about what the relationship is, so the exercise is likely to turn out to be a dead end.

That said, the measure implemented in

Code:

ssc desc bkrosenblatt

is closer to Hoeffding's d than to Somers' d. Despite the high level of people behind it (three of the four people discussed in the help were members of the US National Academy of Sciences) I can't say that I've ever found the measure helpful in data analysis.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1119
#3

30 Oct 2024, 07:59

I had never heard of Hoeffding's D. There is a brief description of it on slide 18 here. And I was able to download a PDF of the 1948 article by doing a Google Scholar search on <Hoeffding 1948 a non-parametric test of independence>.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
1 like
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 663
#4

30 Oct 2024, 08:14

Really interesting slides! Is the presented MIC (Maximal Information Coefficient) implemented in Stata somewhere?

Best wishes

(Stata 16.1 MP)
Comment
Leonardo Guizzetti

Join Date: Jul 2016

Posts: 2389
#5

30 Oct 2024, 08:16

Why not use one of the more familiar and well-understand Chi2 tests of independence?
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#6

30 Oct 2024, 09:31

Chatterjee, Sourav. 2020. “A New Coefficient of Correlation.” Journal of the American Statistical Association 116 (536): 2009–22. doi:10.1080/01621459.2020.1758115.

is a further claimant in this territory.

Comments and references in the blog entry https://statmodeling.stat.columbia.e...tion-measures/ seem to encapsulate the promises, problems, and pitfalls here. Each time that someone claims some new universal correlation measure there are usually objections and counter-examples that it has (far) more problems and pitfalls and less scope than the original authors claimed.
Comment
Felix Bittmann

Join Date: Aug 2018

Posts: 663
#7

30 Oct 2024, 11:19

I remember this paper, we had a interesting discussion here: https://www.statalist.org/forums/for...of-correlation

Best wishes

(Stata 16.1 MP)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35436
#8

30 Oct 2024, 12:28

Felix Bittmann Thanks for the reminder of an intriguing thread.
Comment
George Ford

Join Date: Aug 2014

Posts: 3120
#9

30 Oct 2024, 14:57

try this. It appears to be a modification of H's D.

Code:

net install bkrosenblatt.pkg , from(http://fmwww.bc.edu/RePEc/bocode/b/)

Written by Nick Cox.

Last edited by George Ford; 30 Oct 2024, 14:59.
Comment
George Ford

Join Date: Aug 2014

Posts: 3120
#10

30 Oct 2024, 15:05

Oops. Nick already recommended it.
Comment

George Ford

Join Date: Aug 2014
Posts: 3120

#11

30 Oct 2024, 15:22

Translated

HTML Code:

https://github.com/Dicklesworthstone/hoeffdings_d_explainer

to Stata. The results match. Can't promise the original was correct. A argument for H's D is provided on the link.

Code:

clear all
    
program define hoeffding_d_ties , rclass
    version 14.0
    syntax varlist(min=2 max=2) [if] [in]
    
    marksample touse
    local x: word 1 of `varlist'
    local y: word 2 of `varlist'
    
    tempvar rx ry Q
    tempname D n D1 D2 D3
    
    ** Get sample size
    quietly count if `touse'
    local n = r(N)
    
    ** Calculate ranks
    quietly {
        egen `rx' = rank(`x') if `touse'
        egen `ry' = rank(`y') if `touse'
    }
    
    ** Generate Q values with tie handling
    generate double `Q' = 1 if `touse'
    
    ** Loop through observations to calculate Q values
    quietly {
        forvalues i = 1/`n' {
            local xi = `rx'[`i']
            local yi = `ry'[`i']
            
            ** Count points with both lower ranks
            count if `rx' < `xi' & `ry' < `yi' & `touse'
            replace `Q' = `Q' + r(N) in `i'
            
            ** Handle ties in both variables (1/4 contribution)
            count if `rx' == `xi' & `ry' == `yi' & `touse'
            replace `Q' = `Q' + (r(N) - 1) / 4 in `i'
            
            ** Handle ties in x only (1/2 contribution)
            count if `rx' == `xi' & `ry' < `yi' & `touse'
            replace `Q' = `Q' + r(N) / 2 in `i'
            
            ** Handle ties in y only (1/2 contribution)
            count if `rx' < `xi' & `ry' == `yi' & `touse'
            replace `Q' = `Q' + r(N) / 2 in `i'
        }
    }
    
    ** Calculate D1
    tempvar term1
    generate double `term1' = (`Q' - 1) * (`Q' - 2) if `touse'
    quietly summarize `term1' if `touse'
    scalar `D1' = r(sum)
    
    ** Calculate D2
    tempvar term2
    generate double `term2' = (`rx' - 1) * (`rx' - 2) * (`ry' - 1) * (`ry' - 2) if `touse'
    quietly summarize `term2' if `touse'
    scalar `D2' = r(sum)
    
    ** Calculate D3
    tempvar term3
    generate double `term3' = (`rx' - 2) * (`ry' - 2) * (`Q' - 1) if `touse'
    quietly summarize `term3' if `touse'
    scalar `D3' = r(sum)
    
    ** Calculate final Hoeffding's D
    scalar `D' = 30 * ((`n' - 2) * (`n' - 3) * `D1' + `D2' - 2 * (`n' - 2) * `D3') / **/
                 (`n' * (`n' - 1) * (`n' - 2) * (`n' - 3) * (`n' - 4))
    
    ** Display results
    display as text "Hoeffding's D statistic = " as result scalar(`D')
    
    ** Return values
    return scalar D = scalar(`D')
    return scalar N = scalar(`n')
    return scalar D1 = scalar(`D1')
    return scalar D2 = scalar(`D2')
    return scalar D3 = scalar(`D3')
end

input x y
    55 125
    62 145
    68 160
    70 156
    72 190
    65 150
    67 165
    78 250
    78 250 
    78 250
    end

hoeffding_d_ties x y

Comment

Tommaso Salvitti

Join Date: Nov 2019

Posts: 165
#12

05 Nov 2024, 23:36

Thanks Leonardo Guizzetti I THINK I USE the more familiar chi2 squared test.

Thanks a lot to every body
Comment

Announcement

Hoeffding’s D correlation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment