Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • heatmap from correlation table (matrix)

    This is post to share how to make a heatmap based on a correlation matrix (as in picture attached).

    I'm sure there other (and better) ways to do it, so feel free signal these below. Also feel free to convert this to an ado if you feel so.

    I hope it helps someone:

    Code:
    // correlation definitions
    // -----------------------
    sysuse auto, clear
    loc myvars price mpg rep78 headroom weight length displacement gear_ratio foreign
    
    
    pwcorr `myvars'
    mat A = r(C)
    clear
    svmat2 A, rnames(vars)
    gen id1 = _n
    reshape long A, i(vars) j(id2)
    order vars id1 id2 A
    sort id1 id2 
    ren A corr
    
    format %9.3f corr 
    
    
    
    // heatmap definitions
    // -------------------
    loc targetvar corr
    loc xnum id1
    loc ynum id2
    
    loc num = 0 
    loc labels = "" 
    foreach x of loc myvars {
    loc num = `num'+1 
    loc ylabels = `"`ylabels' `num' "`x'""'
    loc xlabels = `"`xlabels' `num' "' 
    }
    di `"`ylabels'"'
    di `"`xlabels'"'
    
    * colors & options
    * ----------------
    loc poscol blue // green
    loc negcol yellow // red 
    loc msize = 5 // marker size
    loc posmlabc white // label colors
    loc negmlabc black // label colors
    loc mlabs = 2 // label size
    loc hmoptions = `"leg(off) xsize(10) ysize(10) ysc(rev) xsc(alt) plotr(fc(white) lc(black)  m(3 3 3 3)) graphr(c(white)) xtitle(" ") ytitle(" ") "' 
    
    * thresholds
    * ----------
    loc posmax = 1
    loc negmax = -1 
    
    loc posstep = 10 // add more or less positive shades
    loc negstep = 10 // add more or less negative shades
    
    loc myhm = ""
    // positive val loops
    loc step= `posmax'/`posstep'
    loc prevx = 0
    forv x = 0 (`step') `posmax' {
    loc myhm = "`myhm' (scatter `ynum' `xnum' if `targetvar'>`prevx' & `targetvar'<=`x'" ///
             + ", ml(`targetvar') mlabs(`mlabs') mlabc(`posmlabc') mlabpos(0) m(S) msize(`msize') mc(`poscol'*" ///
             + strofreal(`x'/`posmax')+"))"
    loc prevx = `x'
    di "`x'"
    }
    loc myhm = "`myhm' (scatter `ynum' `xnum' if `targetvar'>`prevx' & `targetvar'<=`posmax'" ///
             + ", ml(`targetvar') mlabs(`mlabs') mlabc(`posmlabc') mlabpos(0) m(S) msize(`msize') mc(`poscol'*1))"
    
    // negative val loops
    loc step= `negmax'/`negstep'
    loc prevx = 0
    forv x = 0 (`step') `negmax' {
    loc myhm = "`myhm' (scatter `ynum' `xnum' if `targetvar'<`prevx' & `targetvar'>=`x'" ///
             + ", ml(`targetvar') mlabs(`mlabs') mlabc(`negmlabc') mlabpos(0) m(S) msize(`msize') mc(`negcol'*" ///
             + strofreal(`x'/`negmax')+"))"
    loc prevx = `x'
    di "`x'"
    }
    loc myhm = "`myhm' (scatter `ynum' `xnum' if `targetvar'<`prevx' & `targetvar'>=`negmax'" ///
             + ", ml(`targetvar') mlabs(`mlabs') mlabc(`negmlabc') mlabpos(0) m(S) msize(`msize') mc(`negcol'*1))"
             
             di "`myhm'"
    gr tw `myhm' , `hmoptions' ///
      ylabel(`ylabels' , noticks labs(2.66) labgap(vsmall) angle(0) nogrid) /// 
      xlabel(`xlabels' , noticks labs(2.66) labgap(vsmall) angle(0) nogrid )



    Attached Files

  • #2
    I see you put a lot of effort into this graph and I want to point out, for some users the ado heatplot by Ben Jann might do a similar trick, see slide 30 and following: https://www.stata.com/meeting/german...any19_Jann.pdf
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      If people want to do this, there are at least two other strategies: create a dataset of correlations and then fire up a heatmap command (heatplot from Ben Jann on SSC seems likely to be the most versatile) or use a direct command.

      On the latter I wrote
      corrtable (SSC) for a friend who seemed very pleased with the result, but for my own work I've never found it much superior to a scatter plot matrix or a correlation matrix itself. The goals and expectations aren't identical, however. I don't think there is an easy compromise. For say 3 variables showing the scatter plots with annotation is better. For say 30 variables, nothing much works well, graph or table. For say 10 variables there should be a sweet spot in which a graph is more striking and more useful than a table, but I've found it elusive.

      The order of the variables is in practice a crucial detail.

      Comment

      Working...
      X