How to make the size of points in scatterplot be the area, as defined by a third variable?

Todd Jones

Join Date: Oct 2020

Posts: 43
#1

How to make the size of points in scatterplot be the area, as defined by a third variable?

25 Feb 2025, 09:18

First, note that I asked this on StackOverflow, and it was recommended that I contact StataCorp technical support (https://stackoverflow.com/questions/...ea-or-somethin).

How can I exactly control the size of the dots in a scatterplot such that the size of each dot corresponds exactly to the area (as opposed to diameter) as defined by a third variable?

I know that I can use weights, as in this example, where I weight by weight (pun intended):

Code:

sysuse auto2, clear scatter price mpg[w=weight]

What exactly are these weights doing? I may not be looking in the right place, but it seems unclear how the weighting function scales the size (https://www.statalist.org/forums/for...scatter%20area) and (https://www.stata.com/statalist/arch.../msg01143.html).

If there isn't a solution, I could make my graph in R, though this would be inconvenient.

Last edited by Todd Jones; 25 Feb 2025, 09:25.
Tags: None

George Ford

Join Date: Aug 2014
Posts: 3152

25 Feb 2025, 09:26

Code:

clear all

set obs 100
g pop = 100 + 500*(_n>30) + 1000*(_n>70)

g x = rgamma(5,1)
g y = 1 + 0.5*x + rnormal()

scatter x y [aw=pop] , color(black%20)

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10195
#3

25 Feb 2025, 10:19

Originally posted by Todd Jones View Post

What exactly are these weights doing? I may not be looking in the right place, but it seems unclear how the weighting function scales the size

The weights are scaled ordinally, so Stata does not interpret the weight literally. If you wanted a literal interpretation, then you'd want to include all missing levels. That would be impractical if you have a continuous variable and the weights were in units of thousands, e.g., car weights in the auto dataset. However, you still could group weights into units of 1000, 500, 100 or whatever seems reasonable.
Comment
Todd Jones

Join Date: Oct 2020

Posts: 43
#4

25 Feb 2025, 11:16

Thanks!

Last edited by Todd Jones; 25 Feb 2025, 11:29.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10195

25 Feb 2025, 12:36

I am not good at judging area with the naked eye, so I cannot determine whether the claim that the weighting is proportional to circumference rather than area is true. However, let's assume this holds and that you have weights representing the areas of circles. Since you already know how to compute the area of a circle, you can determine the radius and then calculate the circumference. From there, you can create weights based on the circumferences.

$$ \text{Area}= \pi r^2 \Rightarrow r = \sqrt{\frac{\text{Area}}{\pi}} $$

and

$$ \text{Circumference}= 2\pi r.$$

Code:

clear
input float(x y area) str20(label)
2 2  100 "{bf:A}"
4 4  200 "{bf:B}/ 2*A"
6 6  300 "{bf:C}/ 3*A"
8 8 400  "{bf:D}/ 4*A / 2*B"
10 10 600 "6*A/ 2*C"
12 12 800 "8*A/ 4*B / 2*D"
end

gen radius= sqrt(area/_pi)
gen circumference= 2*_pi*radius
gen c=round(circumference, 2)
list, sep(0)

*CREATE DATASET WITH ALL LEVELS TO APPEND
frame create obs
qui sum c
frame obs{
    set obs `r(max)'
    gen c=_n
    tempfile toappend
    save `toappend'
}
append using `toappend'
frame drop obs

tw (scatter x y [aw=c], ms(Oh) xsc(r(. 14)) leg(off)) ///
(scatter x y, ms(none) mlab(label) mlabgap(*2in)), ///
title(CIRCUMFERENCES AS WEIGHT)

Res.:

Code:

. list, sep(0)

     +----------------------------------------------------------------+
     |  x    y   area               label     radius   circum~e     c |
     |----------------------------------------------------------------|
  1. |  2    2    100              {bf:A}   5.641896   35.44908    36 |
  2. |  4    4    200         {bf:B}/ 2*A   7.978846   50.13256    50 |
  3. |  6    6    300         {bf:C}/ 3*A    9.77205    61.3996    62 |
  4. |  8    8    400   {bf:D}/ 4*A / 2*B   11.28379   70.89816    70 |
  5. | 10   10    600            6*A/ 2*C   13.81977   86.83215    86 |
  6. | 12   12    800      8*A/ 4*B / 2*D   15.95769   100.2651   100 |
     +----------------------------------------------------------------+

Click image for larger version

Name: Graph.png
Views: 1
Size: 45.7 KB
ID: 1773463

Last edited by Andrew Musau; 25 Feb 2025, 12:46.

Comment

Todd Jones

Join Date: Oct 2020

Posts: 43
#6

25 Feb 2025, 14:00

Thank you! So now it is just a matter of figuring out how Stata scales (by area, diameter, circumference, or something else), and then adjusting the variable accordingly?
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10195
#7

25 Feb 2025, 15:10

Exactly. Assuming $\frac{Size_{x+1}}{Size_x} =\frac{Size_{x+2}}{Size_{x+1}} = \cdots = \frac{Size_{x+n}}{Size_{x+n-1}}$, all you need to know is the scaling factor, which allows you to adjust your weights accordingly.
Comment

Announcement