ANOVA p-value decimal places

Edwin Jabbari

Join Date: Jul 2017

Posts: 7
#1

ANOVA p-value decimal places

02 Jul 2017, 05:10

Hi all.
Would be very grateful if anyone could tell me how to increase the number of decimal places shown for an ANOVA p-value (Prob > F) on stata. I currently have 4, i.e. 0.0000. Previous threads have mentioned using .return list but this doesn't seem to work on ANOVA...

Ed
Tags: None
Bruce Weaver

Join Date: May 2014

Posts: 1119
#2

02 Jul 2017, 06:25

You could do something like this:

Code:

clear sysuse auto anova mpg rep78 *ereturn list local pModel = Ftail(e(df_m),e(df_r),e(F)) display "Model p = " `pModel'

Output: Model p = .00162691

You may have to tweak the code a bit, depending on how many terms you have in your model and which p-value you want, etc. HTH.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 18.5 (Windows)
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

02 Jul 2017, 06:33

The output of help anova shows us that the anova command returns estimation results in e() so the equivalent of return list is ereturn list. You will have to calculate the p-value manually.

Code:

. sysuse auto, clear
(1978 Automobile Data)

. anova price foreign

                         Number of obs =         74    R-squared     =  0.0024
                         Root MSE      =    2966.38    Adj R-squared = -0.0115

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  1507382.7          1   1507382.7      0.17  0.6802
                         |
                 foreign |  1507382.7          1   1507382.7      0.17  0.6802
                         |
                Residual |  6.336e+08         72   8799416.9  
              -----------+----------------------------------------------------
                   Total |  6.351e+08         73     8699526  

. display e(df_m)
1

. display e(df_r)
72

. display e(F)
.17130484

. display Ftail(e(df_m),e(df_r),e(F))
.68018509

Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

02 Jul 2017, 07:53

I just wish to comment on two aspects of this issue, apart from the honest desire of getting a value with ultimate precision.

With regards to quite small p-values (such as 0.0000) as it seems to be the case, I fear most journals (at least in health sciences) will demand just informing that p is < 0.001.

Code:

. use http://www.stata-press.com/data/r15/systolic.dta
(Systolic Blood Pressure Data)

. anova systolic drug

                         Number of obs =         58    R-squared     =  0.3355
                         Root MSE      =    10.7211    Adj R-squared =  0.2985

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  3133.2385          3   1044.4128      9.09  0.0001
                         |
                    drug |  3133.2385          3   1044.4128      9.09  0.0001
                         |
                Residual |  6206.9167         54    114.9429  
              -----------+----------------------------------------------------
                   Total |  9340.1552         57   163.86237  

. display Ftail(e(df_m),e(df_r),e(F))
.0000575

Therefore, taking the example above, in spite of a more precise rendition of the p-value, the information for the journal would be the same. With due reason, I fear say.

On the other side, when p-values are "high", so to speak, perhaps too much precision wouldn't provide extra insights as well:

Code:

. anova systolic disease

                         Number of obs =         58    R-squared     =  0.0523
                         Root MSE      =    12.6861    Adj R-squared =  0.0179

                  Source | Partial SS         df         MS        F    Prob>F
              -----------+----------------------------------------------------
                   Model |  488.63938          2   244.31969      1.52  0.2282
                         |
                 disease |  488.63938          2   244.31969      1.52  0.2282
                         |
                Residual |  8851.5158         55   160.93665  
              -----------+----------------------------------------------------
                   Total |  9340.1552         57   163.86237  

. display Ftail(e(df_m),e(df_r),e(F))
.22816436

In short, apart from issues related to a couple of fields (genetics being one of them), I'm afraid that the extra effort to provide a quite precise p-value, well, would risk giving too much value to the p-value.

Best regards,

Marcos

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

02 Jul 2017, 08:53

Despite having provided sample code to accomplish what was desired, I did so somewhat reluctantly and on the whole agree with the analysis in post #4 by Marcos, which was better expressed than I could have done.
Comment
Edwin Jabbari

Join Date: Jul 2017

Posts: 7
#6

02 Jul 2017, 10:02

Dear all,

thanks very much for your helpful posts.
I should say that I'm requesting this as I'm doing multiple ANOVAs so will need to see if p-values reach a corrected p-value significance level which is 6 decimal places.

with this in mind, I tried the code that was suggested in a couple of the above messages -
display Ftail(e(df_m),e(df_r),e(F)) however, I don't get a numerical output. Instead, I just get a dot on the line below. Any ideas where I'm going wrong???? ed
Comment
Edwin Jabbari

Join Date: Jul 2017

Posts: 7
#7

02 Jul 2017, 10:03

In particular, William - could you please explain how you calculated your p-value manually??
many thanks.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#8

02 Jul 2017, 10:14

Reading help anova we see the following among the much larger list of stored results:

Code:

Stored results anova stores the following in e(): Scalars ... e(df_m) model degrees of freedom ... e(df_r) residual degrees of freedom ... e(F) F statistic ...

the values of which were displayed in my post #3.

Reading help Ftail we see

Code:

Ftail(df1,df2,f) Description: the reverse cumulative (upper tail or survivor) F distribution with df1 numerator and df2 denominator degrees of freedom; 1 if f < 0

So

Code:

Ftail(e(df_m),e(df_r),e(F))

is the probability an F distribution with 1 degree of freedom in the numerator and 72 degrees of freedom in the denominator equals or exceeds 0.1713.

With regard to the question about what you're doing wrong, did you issue the command you cited immediately after doing the anova? Subsequent commands may replace the contents of e(). If it happens again, issue the command

Code:

ereturn list

to see what is in e().

If that doesn't help, you should review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

Section 12.1 is particularly pertinent

12.1 What to say about your commands and your problem

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!

Read about using CODE blocks to copy commands and output from your Results window and paste them into a CODE block in a Statalist post, as I did in post #3. It will be important to see your anova, ereturn, and display commands and their output.

Last edited by William Lisowski; 02 Jul 2017, 10:24.
1 like
Comment
Edwin Jabbari

Join Date: Jul 2017

Posts: 7
#9

03 Jul 2017, 00:55

Thanks! Really helpful.
Comment

Announcement