Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Connected graph with markers representing categories of coefficient of variations

    I'm trying to make a connected graph where the markers are different sizes, representing different levels of the observation's coefficient of variation (CV). I categorized CV to be 1, 2, or 3 (depending on how large it is), and I would like the connected chart to have the smallest dot if CV=1, and the largest dot of CV=3. I know that one way to customize the connected "dots" is to layer a connected and scatter plot, as I have done here:

    twoway (connected vcnr0 year, sort color(black))(scatter vcnr0 year [aweight = cv0], color(black))(connected vcnr1 year, color(blue)) (scatter vcnr1 year [aweight = cv1], color(blue))

    However, let's say variable vcnr0 has CVs of 1 and 2, and the variable vcnr1 has CVs of 2 and 3. The above chart does not scale the dots correctly; the size of CV=1 for vcnr0 is the same as CV=2 for vcnr1.

    I tried adding an extra "invisible" scatter plot that contained all three categories:

    twoway (connected vcnr0 year, sort color(black))(scatter vcnr0 year [aweight = cv0], color(black))(connected vcnr1 year, color(blue)) (scatter vcnr1 year [aweight = cv1], color(blue)) (scatter vcnr0 year [aweight=dots], mstyle(none))

    but this did not help. Stata scaled all three weighted scatter plots separately. Can anyone think of a way to write the graphing code and/or organize my data differently to get the scatter plots to scale using ALL the possible weights, not just the weights present in the individual scatter plot?

    Thank you!

    Lisa

  • #2
    Hi Lisa

    It appears that Stata uses the minimum weight to determine the smallest dot size in a scatter plot (i.e. an ordinal scale across scatter plots). One easy way to overcome this issue is to add a phantom data point(s) in your data set. For example, in your example, you could add an extra year (at the end of the sample period) where you assign a value of 1 to the CV of vcnr1. The idea is that variables with the higher weights should also have observations with all lower weights. This should make the plots consistent. After plotting, you can delete the added point(s) on the graph and in the dataset itself (lest you use them in your analysis!)


    Comment


    • #3
      Andrew has a functional suggestion if I had only one figure to make look good, but I'm making a lot of these figures and regenerating them frequently. Thus, manually deleting the "phantom" points using graph editor won't work. Is there a way I can do this using code?
      Thanks!

      Comment


      • #4
        I think supplying code is possible, but please give us a (small) token dataset to act as sandbox.

        Comment


        • #5
          Below is a small dataset. The code that I pasted above:
          Code:
          twoway (connected vcnr0 year, sort color(black))(scatter vcnr0 year [aweight = cv0], color(black))(connected vcnr1 year, color(blue)) (scatter vcnr1 year [aweight = cv1], color(blue))
          illustrates the problem with the scale-ability of the scatter plots.
          Code:
          vcnr1    vcnr0    cv1    cv0    year
          225847.1    105429.6    2    1    2009
          295090.6    107995.2    3    2    2010
          629429.4    146419.4    2    2    2011
          488573.3    129474.9    2    2    2012
          812678.5    129561.3    2    1    2013
          Apologies if I did not post the data correctly. Even after checking the FAQ I wasn't quite clear on how to do it.

          Comment


          • #6
            That's fine. What's suggested is that you show data as produced by input or displayed by list, but the sample above is easy to work with.

            Some technique:

            Code:
             
            clear 
            input vcnr1    vcnr0    cv1    cv0    year
            225847.1    105429.6    2    1    2009
            295090.6    107995.2    3    2    2010
            629429.4    146419.4    2    2    2011
            488573.3    129474.9    2    2    2012
            812678.5    129561.3    2    1    2013
            end 
            reshape long vcnr cv , i(year) j(which) 
            line vcnr year, sort || scatter vcnr year [aw=cv], by(which, legend(off))

            Comment


            • #7
              Hi Lisa, Hi Nick

              Having one variable for CV and a dummy to indicate the graph (which Nick labels "which") still does not solve Lisa's original problem. The issue is that STATA still plots two graphs, and the size of the smallest dot size in a scatter plot corresponding to a given graph is determined by the minimum weight for that graph.


              However, let's say variable vcnr0 has CVs of 1 and 2, and the variable vcnr1 has CVs of 2 and 3. The above chart does not scale the dots correctly; the size of CV=1 for vcnr0 is the same as CV=2 for vcnr1.
              Therefore, my suggestion was to add an extra observation to make the scales common for all graphs. I can offer an additional suggestion which will save Lisa the trouble of manually deleting the points using graph editor. However, bear in mind that this is a second best or third best solution, and the first best would involve not generating any "phantom" data at all, so any suggestions on this are highly welcome.

              Procedure using the data provided

              1) Generate an extra year observation
              2) Plot the graphs and restrict the x-scale and x-label
              3) Delete the added observation

              Hint: By using a missing value for the added observation, you do not need to manually delete any point using graph editor.


              Code:
              clear 
              input vcnr1    vcnr0    cv1    cv0    year
              225847.1    105429.6    2    1    2009
              295090.6    107995.2    3    2    2010
              629429.4    146419.4    2    2    2011
              488573.3    129474.9    2    2    2012
              812678.5    129561.3    2    1    2013
              . . 1 1 2014
              end
              *Note that I add an extra year 2014
              
              twoway (connected vcnr0 year, sort color(black) xscale(range(2009 2013) noextend)xlabel(2009(1)2013))(scatter vcnr0 year [aweight = cv0], color(black))(connected vcnr1 year, color(blue)) (scatter vcnr1 year [aweight = cv1], color(blue))
              
              drop if year>2013

              Comment


              • #8
                For scaling purposes, it may also be preferable to compress the added year, e.g., 2013.001 instead of 2014 in the example above. In this way, the final graph utilizes the entire space.

                Code:
                clear
                input vcnr1    vcnr0    cv1    cv0    year
                225847.1    105429.6    2    1    2009
                295090.6    107995.2    3    2    2010
                629429.4    146419.4    2    2    2011
                488573.3    129474.9    2    2    2012
                812678.5    129561.3    2    1    2013
                . . 1 1 2013.001
                end
                *Note that I add an extra year 2013.001
                
                twoway (connected vcnr0 year, sort color(black) xscale(range(2009 2013) noextend)xlabel(2009(1)2013))(scatter vcnr0 year [aweight = cv0], color(black))(connected vcnr1 year, color(blue)) (scatter vcnr1 year [aweight = cv1], color(blue))
                
                drop if year > 2013
                Last edited by Andrew Musau; 22 Jan 2015, 00:44.

                Comment


                • #9
                  Andrew: You're quite correct. My code doesn't fix this. Even for the same variable, Stata still scales marker size within plots, making comparison across plots hazardous. I didn't check! But I have to regard this as a puzzling feature. To me that's another reason to dislike bubble plots, namely that's it very hard to get them right.

                  Here's another approach:

                  Code:
                   
                  clear 
                  set scheme s1color 
                  input vcnr1    vcnr0    cv1    cv0    year
                  225847.1    105429.6    2    1    2009
                  295090.6    107995.2    3    2    2010
                  629429.4    146419.4    2    2    2011
                  488573.3    129474.9    2    2    2012
                  812678.5    129561.3    2    1    2013
                  end 
                  reshape long vcnr cv , i(year) j(which) 
                  label def which 0 "one lot" 1 "another lot"
                  label val which which 
                  separate vcnr, by(cv) veryshortlabel 
                  line vcnr year, sort || scatter vcnr? year, by(which, legend(off) note("")) ms(O ..) msize(*0.5 *1 *1.5) mcolor(dkgreen ..) ytitle(vcnr)
                  Click image for larger version

Name:	lisa.png
Views:	1
Size:	23.3 KB
ID:	660814


                  Last edited by Nick Cox; 22 Jan 2015, 03:54.

                  Comment


                  • #10
                    Very nice Nick!

                    Comment


                    • #11
                      Thanks very much for your help!

                      Comment


                      • #12
                        Dear users,
                        does anybody know if there is a programme in Stata to produce scatter graph similar to this example which has been produced in R.

                        https://jamanetwork.com/data/Journal...oi160089f1.png

                        Comment


                        • #13
                          In the future, please start a new thread if your problem is different from that of an existing thread. For your question, see
                          Code:
                          help twoway

                          You provide no example data, but the following may help.

                          Code:
                          use http://www.stata-press.com/data/r13/tsrevarex
                          gen gnp2=0.25*gnp
                          set scheme s1color
                          
                          tw (scatter gnp year, msize(small) mcolor(red)) (lowess gnp year, lcolor(red)) ///
                          (scatter gnp2 year, msize(small) mcolor(blue)) (lowess gnp2 year, lcolor(blue)), ///
                          xlab(1990(5)2012) text(130 2011.25 "GNP X", size(small)) ///
                          text(32.25 2011.25 "GNP Y", size(small)) ///
                          leg(off) xtitle("Year") ytitle("$ billions") ylab(, grid)

                          Click image for larger version

Name:	gnp.png
Views:	1
Size:	29.3 KB
ID:	1467128

                          Comment


                          • #14
                            Originally posted by Andrew Musau View Post
                            Hi Lisa

                            It appears that Stata uses the minimum weight to determine the smallest dot size in a scatter plot (i.e. an ordinal scale across scatter plots). One easy way to overcome this issue is to add a phantom data point(s) in your data set. For example, in your example, you could add an extra year (at the end of the sample period) where you assign a value of 1 to the CV of vcnr1. The idea is that variables with the higher weights should also have observations with all lower weights. This should make the plots consistent. After plotting, you can delete the added point(s) on the graph and in the dataset itself (lest you use them in your analysis!)

                            While this suggested workaround wasn't practical for Lisa, who had multiple figures to create, I only had a single figure (in two panels, obtained using "by()"), so it worked for me. Thanks.

                            Comment

                            Working...
                            X