Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a scatter plot with p-value and r^2 included

    Hei
    Is it possible to make a scatter plot which includes "a box" with the p-value and r^2 value in the top (the same way as excel shows an r^2 value in a xy-plot).


  • #2
    Yes, it's entirely possible. aaplot from SSC -- for whch see e.g. http://www.statalist.org/forums/foru...updated-on-ssc -- shows by example that you can program to pick up results from a regression and put them on a graph.

    aaplot deliberately does not show P-values which are often incorrect or scientifically meaningless, and over-rated any way, so you'd need to study the code if you wanted to adapt it to your purposes.

    Alternatively, any scatter plot can show added text in titles (including captions and notes) or added text within the plot region.

    Comment


    • #3
      Thank you Nick I´ll try this if it is not possible to make a plot with a p-value.
      I have tried to add notes in my original plot, and managed to add the r^2 value but not the p-value -can you explain how to do it?

      I know the p-value sometimes is meaningless and a very dependent on the number of observations, but I have to include it, and would really like it to be a part of the graph

      Comment


      • #4
        You would need to retrieve it after the regression as

        fprob(e(df_m),e(df_r),e(F))


        Comment


        • #5
          Another way:


          Code:
          qui reg y x     //Run the regression quietly, where quietly is optional
          
          loc r2 : di %12.2f e(r2)*100   //Get the R-square with two decimal multiplied by 100
          loc t =  _b[x]/_se[x]  // get the t-statistic
          loc p : di %12.3f 2*ttail(e(df_r),abs(`t'))   //Calculate p' value
          twoway (scatter y x), title("R-square (%): `r2'") note("p'-value: `p'") //Scatter plot
          For calculation of p'-value, consult "Where does my p'-values go", Maarten Buis, Stata tip-53
          Last edited by Roman Mostazir; 18 Apr 2016, 10:21. Reason: Author name corrected
          Roman

          Comment


          • #6
            Corrected, thanks Nick.
            Roman

            Comment


            • #7
              Hi Statalisters,

              With credit to Roman and Nick, here's a snippet of code that produces a scatterplot with a regression line and the relevant summary statistics. I hope this code helps produce a visualization of the data in the spirit of an excel graphic and can be modified for use with many data sets. Note: the solution uses Unicode lightly, requiring Stata version 14.

              Code:
              sysuse auto
              
              regress price weight
              test _b[weight]=0
              
              
              mat b = e(b)
              local constant : display %4.3f = b[1,2]
              display  `constant'
              local x : display %4.3f = b[1,1]
              local r2 : display %5.4f = e(r2)
              
              local hats : display     _skip(1) "̂"  _skip(26) "̂"
              local heads_a     "y="
              local heads_b     "x"
              
              local p_value : display %5.4f = r(p)
              
              
              twoway (scatter price weight) || ///
                     (lfit price weight, ///
                    caption("{subscript:`hats'}" ///
                                 "{superscript:`heads_a' `constant'+}{superscript:`x'`heads_b'+ε }" ///
                                 "{superscript:R-squred=`r2'}" ///
                                 "{superscript:P-value (x-hat): `p_value'}", justification(left) position(3)) legend(off))
              Click image for larger version

Name:	Twoway scatter plot excel style.png
Views:	1
Size:	37.3 KB
ID:	1336272


              To be explicitly clear about the "excel-style" summary statistics, when I write above:
              Code:
                   
              caption("{subscript:`hats'}" ///
                           "{superscript:`heads_a' `constant'+}{superscript:`x'`heads_b'+ε }" ///
                           "{superscript:R-squred=`r2'}" ///
                           "{superscript:P-value (x-hat): `p_value'}", justification(left) position(3)) legend(off))
              iI am coding four stacked lines in superscript:
              -line 1 contains local macros y-hat and x-hat
              -line 2 contains text as well as the returned results from the regression matrix called b, e(b)
              -line3 contains text as well as the local macro r2, which is the regression returned scalar e(r2)
              line 4 containts the P-value for x-hat in the fitted regression.

              I agree with Nick's caution about relying and interpreting the social science meaning of a P-value. This snippet, however, should be applicable with many data sets, provided the y-hat and x-hat spacing is wiggled around with.

              Thank you,
              -Benjamin Chartock

              Comment


              • #8
                Thank you so much everyone! I'll try it as the first thing tomorrow

                Comment


                • #9
                  It works
                  Thank you so much from a STATA-beginner

                  Comment


                  • #10
                    Note the typo: R-squred should be R-squared.

                    But Benjamin probably meant something more like this

                    Code:
                    "R{superscript:2} = `r2'}"
                    If you put an entire expression in superscript syntax, the effect is just to put everything shown as such in a smaller font. There are then no superscripts shown, strict sense.

                    Kristin: For STATA read Stata http://www.statalist.org/forums/help#spelling

                    Everyone: For excel read MS Excel

                    Comment


                    • #11
                      Nick, I appologies. I actually read your guidance on how to ask for help, together with a book about Stata and several guidelines on the internet before asking, but I'm sorry that I missed it.

                      Comment


                      • #12
                        Hi Nick,

                        This following comment might not be relevant to Kristine.

                        The purpose of the superscripting is to write the equation for the fitted line with statistical formatting. The circumflex is a diacritical mark that is used to represent that a regression line is estimated from collected data instead of representing a theoretical relationship between variables. If one believes that there is a "true" or structural relationship between y and x, but has y & x data from a subsample of the population, the typesetting of the regression equation should include the y-hat circumflex to indicate the estimator is not the theoretical value.

                        Since "y-hat" is not a Unicode character, one can't render it in a Stata graphic. Accordingly, I subscripted the circumflex and superscripted the estimator y to achieve the effect of y-hat.
                        It was a work around for squeezing the circumflex closer to the estimated coefficient on subsequent lines.


                        -Benjamin

                        Comment


                        • #13
                          I don't understand why what you do to get the hat shown has any bearing on how you render R-squared. It sounds as if you have made everything small for the sake of a hat symbol.

                          Comment

                          Working...
                          X