Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to make the size of points in scatterplot be the area, as defined by a third variable?

    First, note that I asked this on StackOverflow, and it was recommended that I contact StataCorp technical support (https://stackoverflow.com/questions/...ea-or-somethin).

    How can I exactly control the size of the dots in a scatterplot such that the size of each dot corresponds exactly to the area (as opposed to diameter) as defined by a third variable?

    I know that I can use weights, as in this example, where I weight by weight (pun intended):

    Code:
    sysuse auto2, clear
    scatter price mpg[w=weight]
    What exactly are these weights doing? I may not be looking in the right place, but it seems unclear how the weighting function scales the size (https://www.statalist.org/forums/for...scatter%20area) and (https://www.stata.com/statalist/arch.../msg01143.html).

    If there isn't a solution, I could make my graph in R, though this would be inconvenient.
    Last edited by Todd Jones; Today, 10:25.

  • #2
    Code:
    clear all
    
    set obs 100
    g pop = 100 + 500*(_n>30) + 1000*(_n>70)
    
    g x = rgamma(5,1)
    g y = 1 + 0.5*x + rnormal()
    
    scatter x y [aw=pop] , color(black%20)

    Comment


    • #3
      Originally posted by Todd Jones View Post
      What exactly are these weights doing? I may not be looking in the right place, but it seems unclear how the weighting function scales the size
      The weights are scaled ordinally, so Stata does not interpret the weight literally. If you wanted a literal interpretation, then you'd want to include all missing levels. That would be impractical if you have a continuous variable and the weights were in units of thousands, e.g., car weights in the auto dataset. However, you still could group weights into units of 1000, 500, 100 or whatever seems reasonable.

      Comment


      • #4
        Thanks!
        Last edited by Todd Jones; Today, 12:29.

        Comment


        • #5
          I am not good at judging area with the naked eye, so I cannot determine whether the claim that the weighting is proportional to circumference rather than area is true. However, let's assume this holds and that you have weights representing the areas of circles. Since you already know how to compute the area of a circle, you can determine the radius and then calculate the circumference. From there, you can create weights based on the circumferences.

          $$ \text{Area}= \pi r^2 \Rightarrow r = \sqrt{\frac{\text{Area}}{\pi}} $$

          and

          $$ \text{Circumference}= 2\pi r.$$

          Code:
          clear
          input float(x y area) str20(label)
          2 2  100 "{bf:A}"
          4 4  200 "{bf:B}/ 2*A"
          6 6  300 "{bf:C}/ 3*A"
          8 8 400  "{bf:D}/ 4*A / 2*B"
          10 10 600 "6*A/ 2*C"
          12 12 800 "8*A/ 4*B / 2*D"
          end
          
          gen radius= sqrt(area/_pi)
          gen circumference= 2*_pi*radius
          gen c=round(circumference, 2)
          list, sep(0)
          
          *CREATE DATASET WITH ALL LEVELS TO APPEND
          frame create obs
          qui sum c
          frame obs{
              set obs `r(max)'
              gen c=_n
              tempfile toappend
              save `toappend'
          }
          append using `toappend'
          frame drop obs
          
          tw (scatter x y [aw=c], ms(Oh) xsc(r(. 14)) leg(off)) ///
          (scatter x y, ms(none) mlab(label) mlabgap(*2in)), ///
          title(CIRCUMFERENCES AS WEIGHT)

          Res.:

          Code:
          . list, sep(0)
          
               +----------------------------------------------------------------+
               |  x    y   area               label     radius   circum~e     c |
               |----------------------------------------------------------------|
            1. |  2    2    100              {bf:A}   5.641896   35.44908    36 |
            2. |  4    4    200         {bf:B}/ 2*A   7.978846   50.13256    50 |
            3. |  6    6    300         {bf:C}/ 3*A    9.77205    61.3996    62 |
            4. |  8    8    400   {bf:D}/ 4*A / 2*B   11.28379   70.89816    70 |
            5. | 10   10    600            6*A/ 2*C   13.81977   86.83215    86 |
            6. | 12   12    800      8*A/ 4*B / 2*D   15.95769   100.2651   100 |
               +----------------------------------------------------------------+
          Click image for larger version

Name:	Graph.png
Views:	1
Size:	45.7 KB
ID:	1773463

          Last edited by Andrew Musau; Today, 13:46.

          Comment


          • #6
            Thank you! So now it is just a matter of figuring out how Stata scales (by area, diameter, circumference, or something else), and then adjusting the variable accordingly?

            Comment


            • #7
              Exactly. Assuming \(\frac{Size_{x+1}}{Size_x} =\frac{Size_{x+2}}{Size_{x+1}} = \cdots = \frac{Size_{x+n}}{Size_{x+n-1}}\), all you need to know is the scaling factor, which allows you to adjust your weights accordingly.

              Comment

              Working...
              X