Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
"scatter area"
Clear All
new posts

  • Scatterplots with weighted marker size revisited

    Hello everybody,

    this is not strictly a technical question, but more one about how to find an appropriate visualization for multidimensional data.

    I found one way to approach this in stata is using weights in scatterplots to adjust markersize.
    However, I found the result looked kinda odd and the actual marker sizes did not really seem to be a proportional representation of the underlying weights.
    Apparently the algorithm behind uses some kind of smoothing so marker sizes do not get out of control in presence of outliers.

    This is what the manual suggests. In some cases this may be misleading, however. Now, Nick Cox also brought up this point in this older post: https://www.stata.com/statalist/arch.../msg01143.html

    He also mentioned there are better ways to display trivariate data. But I couldn't really come up with a better idea for myself.
    So, I thought maybe the statalisters would have suggestions how to approach such a graphics problem?


    Maybe it's easier to reason about this using an example, so here the one from the manual:

    Code:
    sysuse census, clear
    
    generate drate = divorce / pop18p
    
    label var drate "Divorce rate"
    
    scatter drate medage [w=pop18p] if state!="Nevada", msymbol(Oh)
            note("Stata data excluding Nevada"
            "Area of symbol proportional to state's population aged 18+")
    Click image for larger version

Name:	Clipboard02.jpg
Views:	1
Size:	20.4 KB
ID:	1538360





    Best
    Boris



    Last edited by Boris Ivanov; 25 Feb 2020, 09:12. Reason: typo

  • #2
    Bubble charts worked for Hans Rosling in a justly famous TED talk. That's partly because of the examples he used. In most other cases, to me they look a useless mess.

    A canonical example is country populations, which vary by a factor of about 1 billion. Do you really want the biggest circle to be 1 billion times the area of the smallest?

    If the question is what else did I have in mind in 2008, goodness knows, except I think scatter plot matrices or dot or bar charts in parallel.

    Now other answers are possible. Here's one. Use different colour intensities for population on a approximately logarithmic scale. Some experiment not shown here indicated that log base 2 of population rounded down gives 7 classes, and I don't want more. I didn't cheat by omitting Nevada, but I did cheat by using logarithmic scale for divorce rate too.

    Suppressing the legend is deliberate. If someone likes this, the story is just stronger marker colours mean larger populations on a stepped logarithmic scale. I identify only states on the convex hull on these scales.

    Code:
    sysuse census, clear
    
    generate drate = divorce / pop18p
    
    label var drate "Divorce rate"
    
    gen toshow = floor(log(pop18p)/ log(2))
    separate drate , by(toshow)
    
    gen tolabel = state2 if inlist(state2, "PA", "ND", "NV", "FL", "UT", "AK")
    local mlabel = 7 * "tolabel " 
    
    set scheme s1color 
    
    scatter drate?? medage, mfc(blue*0.03 blue*0.06 blue*0.12 blue*0.25 blue*0.5 blue blue*2) ///
    mlc(blue ..) mla(`mlabel') legend(off) ytitle(Divorce rate) mlabc(blue ..) ysc(log) xla(24/35, format(%2.0f))
    Click image for larger version

Name:	notabubble.png
Views:	1
Size:	27.1 KB
ID:	1538368




    Comment

    Working...
    X