Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hoeffding’s D correlation


    Hi to everybody I need to calculate the Hoeffding correlation coefficient D. I couldn't find much in the literature so I kindly ask you for a comparison. The objective is to see how the correlation is between variable A and variable B. Is the following syntax correct?
    HTML Code:
    somersd A B
    A and B are not binary variables; A and B are numeric variable

    Thanks in advance


    Tommaso

  • #2
    This is a puzzling question. Hoeffding's d and Somers' d are, as I understand it, different beasts that happen to share notation.

    A highly prejudiced view is that such measures are interesting programming challenges for those so inclined but mostly a distraction. They put numbers on strength of relationship without otherwise saying anything about what the relationship is, so the exercise is likely to turn out to be a dead end.

    That said, the measure implemented in

    Code:
    ssc desc bkrosenblatt
    is closer to Hoeffding's d than to Somers' d. Despite the high level of people behind it (three of the four people discussed in the help were members of the US National Academy of Sciences) I can't say that I've ever found the measure helpful in data analysis.

    Comment


    • #3
      I had never heard of Hoeffding's D. There is a brief description of it on slide 18 here. And I was able to download a PDF of the 1948 article by doing a Google Scholar search on <Hoeffding 1948 a non-parametric test of independence>.
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        Really interesting slides! Is the presented MIC (Maximal Information Coefficient) implemented in Stata somewhere?
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          Why not use one of the more familiar and well-understand Chi2 tests of independence?

          Comment


          • #6
            Chatterjee, Sourav. 2020. “A New Coefficient of Correlation.” Journal of the American Statistical Association 116 (536): 2009–22. doi:10.1080/01621459.2020.1758115.

            is a further claimant in this territory.

            Comments and references in the blog entry https://statmodeling.stat.columbia.e...tion-measures/ seem to encapsulate the promises, problems, and pitfalls here. Each time that someone claims some new universal correlation measure there are usually objections and counter-examples that it has (far) more problems and pitfalls and less scope than the original authors claimed.

            Comment


            • #7
              I remember this paper, we had a interesting discussion here: https://www.statalist.org/forums/for...of-correlation
              Best wishes

              (Stata 16.1 MP)

              Comment


              • #8
                Felix Bittmann Thanks for the reminder of an intriguing thread.

                Comment


                • #9
                  try this. It appears to be a modification of H's D.

                  Code:
                  net install bkrosenblatt.pkg , from(http://fmwww.bc.edu/RePEc/bocode/b/)
                  Written by Nick Cox.
                  Last edited by George Ford; 30 Oct 2024, 15:59.

                  Comment


                  • #10
                    Oops. Nick already recommended it.

                    Comment


                    • #11
                      Translated
                      HTML Code:
                      https://github.com/Dicklesworthstone/hoeffdings_d_explainer
                      to Stata. The results match. Can't promise the original was correct. A argument for H's D is provided on the link.

                      Code:
                      clear all
                          
                      program define hoeffding_d_ties , rclass
                          version 14.0
                          syntax varlist(min=2 max=2) [if] [in]
                          
                          marksample touse
                          local x: word 1 of `varlist'
                          local y: word 2 of `varlist'
                          
                          tempvar rx ry Q
                          tempname D n D1 D2 D3
                          
                          ** Get sample size
                          quietly count if `touse'
                          local n = r(N)
                          
                          ** Calculate ranks
                          quietly {
                              egen `rx' = rank(`x') if `touse'
                              egen `ry' = rank(`y') if `touse'
                          }
                          
                          ** Generate Q values with tie handling
                          generate double `Q' = 1 if `touse'
                          
                          ** Loop through observations to calculate Q values
                          quietly {
                              forvalues i = 1/`n' {
                                  local xi = `rx'[`i']
                                  local yi = `ry'[`i']
                                  
                                  ** Count points with both lower ranks
                                  count if `rx' < `xi' & `ry' < `yi' & `touse'
                                  replace `Q' = `Q' + r(N) in `i'
                                  
                                  ** Handle ties in both variables (1/4 contribution)
                                  count if `rx' == `xi' & `ry' == `yi' & `touse'
                                  replace `Q' = `Q' + (r(N) - 1) / 4 in `i'
                                  
                                  ** Handle ties in x only (1/2 contribution)
                                  count if `rx' == `xi' & `ry' < `yi' & `touse'
                                  replace `Q' = `Q' + r(N) / 2 in `i'
                                  
                                  ** Handle ties in y only (1/2 contribution)
                                  count if `rx' < `xi' & `ry' == `yi' & `touse'
                                  replace `Q' = `Q' + r(N) / 2 in `i'
                              }
                          }
                          
                          ** Calculate D1
                          tempvar term1
                          generate double `term1' = (`Q' - 1) * (`Q' - 2) if `touse'
                          quietly summarize `term1' if `touse'
                          scalar `D1' = r(sum)
                          
                          ** Calculate D2
                          tempvar term2
                          generate double `term2' = (`rx' - 1) * (`rx' - 2) * (`ry' - 1) * (`ry' - 2) if `touse'
                          quietly summarize `term2' if `touse'
                          scalar `D2' = r(sum)
                          
                          ** Calculate D3
                          tempvar term3
                          generate double `term3' = (`rx' - 2) * (`ry' - 2) * (`Q' - 1) if `touse'
                          quietly summarize `term3' if `touse'
                          scalar `D3' = r(sum)
                          
                          ** Calculate final Hoeffding's D
                          scalar `D' = 30 * ((`n' - 2) * (`n' - 3) * `D1' + `D2' - 2 * (`n' - 2) * `D3') / **/
                                       (`n' * (`n' - 1) * (`n' - 2) * (`n' - 3) * (`n' - 4))
                          
                          ** Display results
                          display as text "Hoeffding's D statistic = " as result scalar(`D')
                          
                          ** Return values
                          return scalar D = scalar(`D')
                          return scalar N = scalar(`n')
                          return scalar D1 = scalar(`D1')
                          return scalar D2 = scalar(`D2')
                          return scalar D3 = scalar(`D3')
                      end
                      
                      input x y
                          55 125
                          62 145
                          68 160
                          70 156
                          72 190
                          65 150
                          67 165
                          78 250
                          78 250 
                          78 250
                          end
                      
                      hoeffding_d_ties x y

                      Comment


                      • #12
                        Thanks Leonardo Guizzetti I THINK I USE the more familiar chi2 squared test.

                        Thanks a lot to every body

                        Comment

                        Working...
                        X