Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heatmap of variables by region

    Hi Stata users,

    I am having a dataset of several variables and region and would like to explore spatial difference of the different variables

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(sat_1 sat_2 sat_3 sat_4 sat_5) str8 region
    0 1 0 2 4 "Region 1"
    0 1 1 5 2 "Region 5"
    1 1 0 4 2 "Region 2"
    1 1 0 1 3 "Region 5"
    0 1 0 3 1 "Region 1"
    1 1 0 2 3 "Region 4"
    0 1 1 4 1 "Region 3"
    0 0 0 1 2 "Region 5"
    0 1 1 3 1 "Region 4"
    0 1 1 5 5 "Region 2"
    0 0 1 4 4 "Region 4"
    0 0 1 2 4 "Region 5"
    1 0 0 2 2 "Region 1"
    1 0 0 3 2 "Region 5"
    1 1 0 4 1 "Region 4"
    0 0 0 2 2 "Region 2"
    0 0 1 2 1 "Region 5"
    0 0 1 1 5 "Region 5"
    0 1 0 1 1 "Region 5"
    1 0 1 5 4 "Region 1"
    0 0 1 4 5 "Region 5"
    1 0 1 4 3 "Region 2"
    1 1 1 3 1 "Region 5"
    1 1 1 3 3 "Region 3"
    0 0 0 4 5 "Region 3"
    0 0 1 1 3 "Region 1"
    0 1 0 5 1 "Region 2"
    1 0 0 4 3 "Region 2"
    0 0 0 3 1 "Region 3"
    1 1 1 1 4 "Region 5"
    1 0 1 3 3 "Region 2"
    1 1 1 5 1 "Region 1"
    1 0 1 2 4 "Region 2"
    1 1 1 1 5 "Region 2"
    0 1 1 4 3 "Region 3"
    1 1 0 1 1 "Region 4"
    0 1 0 4 3 "Region 2"
    1 1 1 4 2 "Region 1"
    1 0 1 3 1 "Region 2"
    1 1 0 4 1 "Region 3"
    0 0 1 5 1 "Region 2"
    0 0 1 4 5 "Region 4"
    0 0 1 3 5 "Region 5"
    1 0 1 3 2 "Region 3"
    0 0 0 3 1 "Region 2"
    1 1 0 5 5 "Region 5"
    0 1 1 2 1 "Region 3"
    0 0 0 2 1 "Region 3"
    0 1 0 3 5 "Region 4"
    1 1 0 5 3 "Region 1"
    end
    I hold the view the best summary for a region would be the mean.
    The desired visualization is a heatmap as shown below

    Click image for larger version

Name:	heatmap.png
Views:	1
Size:	10.8 KB
ID:	1678592


    I would appreciate any guidance.

    Thanks in advance

  • #2
    Stephen:
    you may want to take a look at: https://www.statalist.org/forums/for...atmap-in-stata
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      A heat map can work well (which means as well or better than alternatives) for say a problem of size 10 x 10 (or much bigger), ideally when

      * there is some natural order for one or both axes

      * you hope for a big picture impression

      * major anomalies may spring out at the reader.

      BUT you are totally reliant on readers' ability to decode colour as indicating quantity.

      If your problem really is the size presented in #1 you can keep much more quantitative detail in a dot or bar chart.

      Here is a token example. I am guessing that each variable is on a scale from 0 to 5. For different emphasis, transpose rows and column variables.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(sat_1 sat_2 sat_3 sat_4 sat_5) str8 region
      0 1 0 2 4 "Region 1"
      0 1 1 5 2 "Region 5"
      1 1 0 4 2 "Region 2"
      1 1 0 1 3 "Region 5"
      0 1 0 3 1 "Region 1"
      1 1 0 2 3 "Region 4"
      0 1 1 4 1 "Region 3"
      0 0 0 1 2 "Region 5"
      0 1 1 3 1 "Region 4"
      0 1 1 5 5 "Region 2"
      0 0 1 4 4 "Region 4"
      0 0 1 2 4 "Region 5"
      1 0 0 2 2 "Region 1"
      1 0 0 3 2 "Region 5"
      1 1 0 4 1 "Region 4"
      0 0 0 2 2 "Region 2"
      0 0 1 2 1 "Region 5"
      0 0 1 1 5 "Region 5"
      0 1 0 1 1 "Region 5"
      1 0 1 5 4 "Region 1"
      0 0 1 4 5 "Region 5"
      1 0 1 4 3 "Region 2"
      1 1 1 3 1 "Region 5"
      1 1 1 3 3 "Region 3"
      0 0 0 4 5 "Region 3"
      0 0 1 1 3 "Region 1"
      0 1 0 5 1 "Region 2"
      1 0 0 4 3 "Region 2"
      0 0 0 3 1 "Region 3"
      1 1 1 1 4 "Region 5"
      1 0 1 3 3 "Region 2"
      1 1 1 5 1 "Region 1"
      1 0 1 2 4 "Region 2"
      1 1 1 1 5 "Region 2"
      0 1 1 4 3 "Region 3"
      1 1 0 1 1 "Region 4"
      0 1 0 4 3 "Region 2"
      1 1 1 4 2 "Region 1"
      1 0 1 3 1 "Region 2"
      1 1 0 4 1 "Region 3"
      0 0 1 5 1 "Region 2"
      0 0 1 4 5 "Region 4"
      0 0 1 3 5 "Region 5"
      1 0 1 3 2 "Region 3"
      0 0 0 3 1 "Region 2"
      1 1 0 5 5 "Region 5"
      0 1 1 2 1 "Region 3"
      0 0 0 2 1 "Region 3"
      0 1 0 3 5 "Region 4"
      1 1 0 5 3 "Region 1"
      end
      
      preserve 
      
      gen id = _n 
      gen reg = real(substr(region, -1, 1))
      
      reshape long sat_, i(id) j(which)
      
      collapse sat_, by(reg which)
      
      set scheme s1color 
      
      * tabplot from Stata Journal 
      tabplot reg which [iw=sat_] , subtitle(mean sat(isfaction?)) ytitle(region) xtitle(better explanation) frame(5) showval(format(%3.2f))
      
      restore
      Click image for larger version

Name:	satisfactiin.png
Views:	1
Size:	25.6 KB
ID:	1678613


      Comment


      • #4
        Nick Cox this is super helpful. Thanks so much for the guidance. A little note is that the scale of the responses may differ.

        Comment


        • #5
          If the scale (meaning, limit) of the responses may differ, it is moot whether throwing them as they come into the same graph is a good idea. But again, my bias is that this is easier to think about with a bar or dot chart than a heat map.

          Comment


          • #6
            Closer inspection suggests that your variables are quite different, some being (0, 1) and others on a scale from 1 to 5.

            One possibility -- which like any other needs to be explained carefully -- is to show means as text on the scale used to produce data but to scale bar heights to fraction of the possible range.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input byte(sat_1 sat_2 sat_3 sat_4 sat_5) str8 region
            0 1 0 2 4 "Region 1"
            0 1 1 5 2 "Region 5"
            1 1 0 4 2 "Region 2"
            1 1 0 1 3 "Region 5"
            0 1 0 3 1 "Region 1"
            1 1 0 2 3 "Region 4"
            0 1 1 4 1 "Region 3"
            0 0 0 1 2 "Region 5"
            0 1 1 3 1 "Region 4"
            0 1 1 5 5 "Region 2"
            0 0 1 4 4 "Region 4"
            0 0 1 2 4 "Region 5"
            1 0 0 2 2 "Region 1"
            1 0 0 3 2 "Region 5"
            1 1 0 4 1 "Region 4"
            0 0 0 2 2 "Region 2"
            0 0 1 2 1 "Region 5"
            0 0 1 1 5 "Region 5"
            0 1 0 1 1 "Region 5"
            1 0 1 5 4 "Region 1"
            0 0 1 4 5 "Region 5"
            1 0 1 4 3 "Region 2"
            1 1 1 3 1 "Region 5"
            1 1 1 3 3 "Region 3"
            0 0 0 4 5 "Region 3"
            0 0 1 1 3 "Region 1"
            0 1 0 5 1 "Region 2"
            1 0 0 4 3 "Region 2"
            0 0 0 3 1 "Region 3"
            1 1 1 1 4 "Region 5"
            1 0 1 3 3 "Region 2"
            1 1 1 5 1 "Region 1"
            1 0 1 2 4 "Region 2"
            1 1 1 1 5 "Region 2"
            0 1 1 4 3 "Region 3"
            1 1 0 1 1 "Region 4"
            0 1 0 4 3 "Region 2"
            1 1 1 4 2 "Region 1"
            1 0 1 3 1 "Region 2"
            1 1 0 4 1 "Region 3"
            0 0 1 5 1 "Region 2"
            0 0 1 4 5 "Region 4"
            0 0 1 3 5 "Region 5"
            1 0 1 3 2 "Region 3"
            0 0 0 3 1 "Region 2"
            1 1 0 5 5 "Region 5"
            0 1 1 2 1 "Region 3"
            0 0 0 2 1 "Region 3"
            0 1 0 3 5 "Region 4"
            1 1 0 5 3 "Region 1"
            end
            
            preserve 
            
            gen id = _n 
            gen reg = real(substr(region, -1, 1))
            
            reshape long sat_, i(id) j(which)
            
            collapse (min) min=sat_ (max) max=sat_ (mean) mean=sat_, by(reg which)
            
            
            set scheme s1color 
            
            gen scaled = (mean - min)/ (max - min)  
            
            * tabplot from Stata Journal 
            tabplot reg which [iw=scaled] , subtitle(mean sat(isfaction?)) ytitle(region) xtitle(better explanation) frame(1) showval(mean, format(%3.2f))
            
            restore
            Click image for larger version

Name:	satisfaction2.png
Views:	1
Size:	25.6 KB
ID:	1678694

            Comment


            • #7
              Oops! #6 calculates minima and maxima incorrectly. What is needed are the minimum and maximum over variables, not over (region, variable) groups.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input byte(sat_1 sat_2 sat_3 sat_4 sat_5) str8 region
              0 1 0 2 4 "Region 1"
              0 1 1 5 2 "Region 5"
              1 1 0 4 2 "Region 2"
              1 1 0 1 3 "Region 5"
              0 1 0 3 1 "Region 1"
              1 1 0 2 3 "Region 4"
              0 1 1 4 1 "Region 3"
              0 0 0 1 2 "Region 5"
              0 1 1 3 1 "Region 4"
              0 1 1 5 5 "Region 2"
              0 0 1 4 4 "Region 4"
              0 0 1 2 4 "Region 5"
              1 0 0 2 2 "Region 1"
              1 0 0 3 2 "Region 5"
              1 1 0 4 1 "Region 4"
              0 0 0 2 2 "Region 2"
              0 0 1 2 1 "Region 5"
              0 0 1 1 5 "Region 5"
              0 1 0 1 1 "Region 5"
              1 0 1 5 4 "Region 1"
              0 0 1 4 5 "Region 5"
              1 0 1 4 3 "Region 2"
              1 1 1 3 1 "Region 5"
              1 1 1 3 3 "Region 3"
              0 0 0 4 5 "Region 3"
              0 0 1 1 3 "Region 1"
              0 1 0 5 1 "Region 2"
              1 0 0 4 3 "Region 2"
              0 0 0 3 1 "Region 3"
              1 1 1 1 4 "Region 5"
              1 0 1 3 3 "Region 2"
              1 1 1 5 1 "Region 1"
              1 0 1 2 4 "Region 2"
              1 1 1 1 5 "Region 2"
              0 1 1 4 3 "Region 3"
              1 1 0 1 1 "Region 4"
              0 1 0 4 3 "Region 2"
              1 1 1 4 2 "Region 1"
              1 0 1 3 1 "Region 2"
              1 1 0 4 1 "Region 3"
              0 0 1 5 1 "Region 2"
              0 0 1 4 5 "Region 4"
              0 0 1 3 5 "Region 5"
              1 0 1 3 2 "Region 3"
              0 0 0 3 1 "Region 2"
              1 1 0 5 5 "Region 5"
              0 1 1 2 1 "Region 3"
              0 0 0 2 1 "Region 3"
              0 1 0 3 5 "Region 4"
              1 1 0 5 3 "Region 1"
              end
              
              preserve 
              
              gen id = _n 
              gen reg = real(substr(region, -1, 1))
              
              reshape long sat_, i(id) j(which)
              
              drop if missing(sat_)
              bysort which (sat_) : gen min = sat_[1]
              by which: gen max = sat_[_N]
              
              collapse min max mean=sat_, by(reg which)
              
              
              set scheme s1color 
              
              gen scaled = (mean - min)/ (max - min)  
              
              * tabplot from Stata Journal 
              tabplot reg which [iw=scaled] , subtitle(mean sat(isfaction?)) ytitle(region) xtitle(better explanation) frame(1) showval(mean, format(%3.2f))
              
              restore
              The graph is similar but not identical.

              Comment


              • #8
                Nick Cox Thanks so much for the elegant solutions!! I am really grateful!

                Comment

                Working...
                X