How to create a scatter plot with p-value and r^2 included

Kristine Hansen

Join Date: Apr 2016

Posts: 6
#1

How to create a scatter plot with p-value and r^2 included

18 Apr 2016, 08:17

Hei
Is it possible to make a scatter plot which includes "a box" with the p-value and r^2 value in the top (the same way as excel shows an r^2 value in a xy-plot).
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35208
#2

18 Apr 2016, 09:02

Yes, it's entirely possible. aaplot from SSC -- for whch see e.g. http://www.statalist.org/forums/foru...updated-on-ssc -- shows by example that you can program to pick up results from a regression and put them on a graph.

aaplot deliberately does not show P-values which are often incorrect or scientifically meaningless, and over-rated any way, so you'd need to study the code if you wanted to adapt it to your purposes.

Alternatively, any scatter plot can show added text in titles (including captions and notes) or added text within the plot region.
Comment
Kristine Hansen

Join Date: Apr 2016

Posts: 6
#3

18 Apr 2016, 09:40

Thank you Nick I´ll try this if it is not possible to make a plot with a p-value.
I have tried to add notes in my original plot, and managed to add the r^2 value but not the p-value -can you explain how to do it?

I know the p-value sometimes is meaningless and a very dependent on the number of observations, but I have to include it, and would really like it to be a part of the graph
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35208
#4

18 Apr 2016, 09:55

You would need to retrieve it after the regression as

fprob(e(df_m),e(df_r),e(F))
Comment

Roman Mostazir

Join Date: Apr 2014
Posts: 868

18 Apr 2016, 10:17

Another way:

Code:

qui reg y x     //Run the regression quietly, where quietly is optional

loc r2 : di %12.2f e(r2)*100   //Get the R-square with two decimal multiplied by 100
loc t =  _b[x]/_se[x]  // get the t-statistic
loc p : di %12.3f 2*ttail(e(df_r),abs(`t'))   //Calculate p' value
twoway (scatter y x), title("R-square (%): `r2'") note("p'-value: `p'") //Scatter plot

For calculation of p'-value, consult "Where does my p'-values go", Maarten Buis, Stata tip-53

Last edited by Roman Mostazir; 18 Apr 2016, 10:21. Reason: Author name corrected

Roman

Comment

Roman Mostazir

Join Date: Apr 2014

Posts: 868
#6

18 Apr 2016, 10:22

Corrected, thanks Nick.

Roman
Comment
Benjamin Chartock

Join Date: Apr 2016

Posts: 6
#7

18 Apr 2016, 11:13

Hi Statalisters,

With credit to Roman and Nick, here's a snippet of code that produces a scatterplot with a regression line and the relevant summary statistics. I hope this code helps produce a visualization of the data in the spirit of an excel graphic and can be modified for use with many data sets. Note: the solution uses Unicode lightly, requiring Stata version 14.

Code:

sysuse auto regress price weight test _b[weight]=0 mat b = e(b) local constant : display %4.3f = b[1,2] display `constant' local x : display %4.3f = b[1,1] local r2 : display %5.4f = e(r2) local hats : display _skip(1) "̂" _skip(26) "̂" local heads_a "y=" local heads_b "x" local p_value : display %5.4f = r(p) twoway (scatter price weight) || /// (lfit price weight, /// caption("{subscript:`hats'}" /// "{superscript:`heads_a' `constant'+}{superscript:`x'`heads_b'+ε }" /// "{superscript:R-squred=`r2'}" /// "{superscript:P-value (x-hat): `p_value'}", justification(left) position(3)) legend(off))

To be explicitly clear about the "excel-style" summary statistics, when I write above:

Code:

caption("{subscript:`hats'}" /// "{superscript:`heads_a' `constant'+}{superscript:`x'`heads_b'+ε }" /// "{superscript:R-squred=`r2'}" /// "{superscript:P-value (x-hat): `p_value'}", justification(left) position(3)) legend(off))

iI am coding four stacked lines in superscript:
-line 1 contains local macros y-hat and x-hat
-line 2 contains text as well as the returned results from the regression matrix called b, e(b)
-line3 contains text as well as the local macro r2, which is the regression returned scalar e(r2)
line 4 containts the P-value for x-hat in the fitted regression.

I agree with Nick's caution about relying and interpreting the social science meaning of a P-value. This snippet, however, should be applicable with many data sets, provided the y-hat and x-hat spacing is wiggled around with.

Thank you,
-Benjamin Chartock
1 like
Comment
Kristine Hansen

Join Date: Apr 2016

Posts: 6
#8

18 Apr 2016, 13:33

Thank you so much everyone! I'll try it as the first thing tomorrow
Comment
Kristine Hansen

Join Date: Apr 2016

Posts: 6
#9

19 Apr 2016, 03:25

It works
Thank you so much from a STATA-beginner
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35208
#10

19 Apr 2016, 04:04

Note the typo: R-squred should be R-squared.

But Benjamin probably meant something more like this

Code:

"R{superscript:2} = `r2'}"

If you put an entire expression in superscript syntax, the effect is just to put everything shown as such in a smaller font. There are then no superscripts shown, strict sense.

Kristin: For STATA read Stata http://www.statalist.org/forums/help#spelling

Everyone: For excel read MS Excel
Comment
Kristine Hansen

Join Date: Apr 2016

Posts: 6
#11

19 Apr 2016, 05:30

Nick, I appologies. I actually read your guidance on how to ask for help, together with a book about Stata and several guidelines on the internet before asking, but I'm sorry that I missed it.
Comment
Benjamin Chartock

Join Date: Apr 2016

Posts: 6
#12

19 Apr 2016, 14:12

Hi Nick,

This following comment might not be relevant to Kristine.

The purpose of the superscripting is to write the equation for the fitted line with statistical formatting. The circumflex is a diacritical mark that is used to represent that a regression line is estimated from collected data instead of representing a theoretical relationship between variables. If one believes that there is a "true" or structural relationship between y and x, but has y & x data from a subsample of the population, the typesetting of the regression equation should include the y-hat circumflex to indicate the estimator is not the theoretical value.

Since "y-hat" is not a Unicode character, one can't render it in a Stata graphic. Accordingly, I subscripted the circumflex and superscripted the estimator y to achieve the effect of y-hat.
It was a work around for squeezing the circumflex closer to the estimated coefficient on subsequent lines.

-Benjamin
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35208
#13

19 Apr 2016, 17:02

I don't understand why what you do to get the hat shown has any bearing on how you render R-squared. It sounds as if you have made everything small for the sake of a hat symbol.
Comment

Announcement

How to create a scatter plot with p-value and r^2 included

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment