Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding individual r2 to all twoway scatter plots

    Hi,

    I am trying to make a group of scatter plots on the same graph using the method shown here: https://www.statalist.org/forums/for...d-r-2-included

    But in my case, there are five sub-plots as the by(Education) has five categories, and it is adding the same r2 value to all the charts.

    Code:
    by Education: reg bmi wscore
    local r2: display %5.4f e(r2)
    
    twoway (scatter bmi wscore, by(Education, note("") graphregion(color(white)))) (lfit bmi wscore),  note(R-squared=`r2')
    I tried another Satalist tutorial, but the result remains the same:

    Code:
    corr bmi wscore
    local corr : di %5.3g r(rho)
    twoway (scatter bmi wscore, by(v717, note("") graphregion(color(white))))  (lfit bmi wscore),  mlabel(code)  subtitle("correlation `corr'")
    Please let me know if there any way to report the inviduals r2 for each of the sub-plots.

    Thank you.



  • #2
    This graphical problem was discussed in

    Stata tip 144: Adding variable text to graphs that use a by() option

    https://journals.sagepub.com/doi/pdf...6867X211063413

    Here's some sample code using one method. I used rangestat from SSC to get correlations, but there are many other ways to do that.


    Code:
    sysuse auto, clear
    rangestat (corr) mpg weight, int(foreign 0 0)
    tabdisp foreign, c(corr_x)
    scatter mpg weight, by(foreign)
    
    gen toshow = "{it:R}{sup:2} = " + string(100 * corr_x^2, "%3.1f") + "%"
    gen y = 40
    gen x = 3800
    
    scatter mpg weight, by(foreign, note("") legend(off)) ytitle("`: var label mpg'") xtitle("`: var label weight'") ///
    || scatter y x , ms(none) mla(toshow) mlabsize(medlarge)
    There is some fiddliness needed here about where to put the R-square values, and so forth. A method discussed in the paper adds them to value labels, so that they appear as subtitles.

    Detail: After your regressions using by:, only one R-square is in the saved results, the last one calculated. The overall correlation, quite apart from not being the overall R-squared, is not what you want either, if I understand correctly.

    Comment


    • #3
      First, the code
      Code:
      by Education: reg bmi wscore
      local r2: display %5.4f e(r2)
      runs the regressions for each education group, but only picks up r2 for the last one. So that's the first problem: you have to actually save each r2 in a separate local macro:

      Code:
      levelsof Education, local(educ)
      foreach e of local educ {
          regress bmi wscore if Education == `e'
          local r2_`e' = e(r2)
      }
      (Note: Assumes Education is a numeric variable. If it is a string variable, replace -if Education == `e'- with -if Education == `"`e'"'-.)

      Next, I don't think you can customize each individual graph in this way using the -by()- option. I think you have to create each graph and then combine them. Actually, it would be most efficient to do that in the same loop that gets the R2 values. So instead of the above code, do:

      Code:
      levelsof Education, local(educ)
      local graphs
      foreach e of local educ {
          regress bmi wscore if Education == `e'
          local r2_`e' = e(r2)
          twoway (scatter bmi wscore if Education == `e',  graphregion(color(white))) ///
          (lfit bmi wscore if Education == `e'), note(R-squared=`r2_`e')   name(graph_`e', replace)
          local graphs `graphs' `graph_`e''
      }
      graph combine `graphs', nocopies
      The above code is, of course, untested, as no example data was provided. Beware of typos, unbalanced parentheses, etc. Again, the code assumes Education is numeric. If it is string, adjust accordingly.
      You might also want to add a -title()- option to the graphing command so that something indicates which panel in the final combined graph is which level of education.

      Added: Crossed with #2. So, it can be done with -by()-, although it requires a fairly opaque ("fiddly") approach.
      Last edited by Clyde Schechter; 06 Feb 2022, 12:38.

      Comment


      • #4
        Here's the other way mentioned to do it, which I think I prefer slightly.


        Code:
        sysuse auto, clear
        rangestat (corr) mpg weight, int(foreign 0 0)
        
        forval f = 0/1 { 
            local label : label (foreign) `f' 
            su corr_x if foreign == `f', meanonly 
            local rsq = "({it:R}{sup:2} = " + string(100 * r(max)^2, "%3.1f") + "%)"
            label define NEW `f' "`label' `rsq'", modify 
        }
        
        label val foreign NEW 
        
        scatter mpg weight, by(foreign, note("")) subtitle(, nobox)
        Click image for larger version

Name:	rsqongraph.png
Views:	1
Size:	23.6 KB
ID:	1648729

        Comment


        • #5
          Yes, FWIW, I think Nick's approach in #4 is much better than both his #2 and my #3.

          Comment

          Working...
          X