You are not logged in. You can browse but not post. Login or Register by clicking 'Login or Register' at the top-right of this page. For more information on Statalist, see the FAQ.
Dare I suggest the Bayesian approach? Vic Barnett's Comparative Statistical Inference led me there, and Joseph Kadane's Principles of Uncertainty reinforced my thoughts.
I find the p-value answers a question I no longer care to have: "How unexpected is this outcome if some null hypothesis about the parameter is not true?" The question I care about is "what does this outcome tell me about the distribution of the data characterized by the parameter in question, taking into account the uncertainty about the value of the parameter, not to mention nuisance parameters, and perhaps the results of other research? And how can I then use that to make a decision among choices that have differing costs and benefits - or different distributions of costs and benefits?"
Well, those aren't exactly the questions I have in my current work; they're the questions I would have had in the career I'd hoped to have had.
But I wish I had realized as an undergraduate that statistics is just one part of the branch of philosophy that concerns itself with how we know what we know. I'm afraid that a half-century ago I took the whole thing about type I and type II errors as if they had the same standing as, say, the foundations of physics and the other "hard" sciences when it seems it does not. Instead, they were a useful approach when the computational horsepower was not available to adopt other approaches.
So for my part, I see the rejection of the p-value as the arbiter of the meaning of an experiment as a useful tool to force statistics to try harder to address the actual questions non-statisticians have about the meaning of data.
my 2 cents: (1) p-values have value; (2) they are often mis-used (but so is (almost) everything else); (3) many of the suggested "replacements" (including Bayesian ones) have a monotonic (but often non-linear) relationship with p-values; (4) to me, the real issue is whether people should use a statistical criterion as a "gateway" or "threshhold" - regardless of the answer I see no way to stop others from doing so and would prefer to put my effort into producing the best analysis I can; a recent book (though it vastly overuses the words "sensible" and relatives might give, at least, some amusement along with some light on the subject:
Macnaughton, Donald B. (2021), The War on Statistical Significance: The American Statistician vs. the New England Journal of Medicine, Kindle Direct publishing (I bought my copy via Amazon but I believe it is available through other sellers also)
Having been hawking the American Statistical Association's critiques of statistical significance (and to a lesser extent of p-values) so much on Statalist for several years now, I've largely stayed out of this discussion. But since it's going on so long, and because I'm particularly prompted by some of Rich Goldstein 's remarks, I'll say a few things here.
(1) p-values have value;
Yes, but that's very faint praise. And since nearly every Stata output that comes with a p-value also comes with a confidence interval, it's hard to see why you would want to use the p-value. The confidence interval is far more easily understood. Yes, you can calculate either knowing the other (and knowing the particular test), but the confidence interval gives you some idea of where the estimate stands in relation to real-world meaningful values of the effect being estimated, the p-value tells you only where it stands in relation to zero, which in most situations is of no interest at all. So why not present the confidence interval--it's understandable with far less work on the part of the reader. (Work which, in reality, most readers will not put in.)
(2) they are often mis-used (but so is (almost) everything else);
Yes, but it's a matter of degree. When I read the literature in my discipline, I would say that the vast majority of uses of p-values are mis-use. When something is misused that frequently, you begin to think maybe it should seldom be used and people ought to push back on its use.
(3) many of the suggested "replacements" (including Bayesian ones) have a monotonic (but often non-linear) relationship with p-values;
This is absolutely true. And these other statistics can be readily misused in the same way.
(4) to me, the real issue is whether people should use a statistical criterion as a "gateway" or "threshhold"
I think this is absolutely the key. To me, the most objectionable thing about p-values is that people pick some threshold value, usually 0.05, but it really doesn't matter which threshold, and then try read certainty-- there is an effect, or there is no effect-- into results that are inherently uncertain. And it is this behavior that, in my opinion, does the most harm. It leads to irreproducible research results. And its application as a filter for publication leads to a strongly biased literature in which effect size estimates are grossly overestimated. If people would just stop doing that, I would chill out about p-values.
- regardless of the answer I see no way to stop others from doing so and would prefer to put my effort into producing the best analysis I can;
Well, I agree with putting effort into producing the best analysis I can. But I prefer not to take a defeatist attitude on this. And here on Statalist, as well as in my day job, at least part of my responsibility is to teach, as I see it. It is hard to change long entrenched habits and views, and I don't really hold out very much hope of doing that. After all, the most likely way in which the old bad habits will finally cede ground is when its practitioners age out of the system.
But it is, in my view, shameful that the p < 0.05 (or other threshold) filtering process is still being widely taught to upcoming generations. Since so many of the people who post on Statalist are self-described newbies, I think it's important to try opening their minds and to fight against the bad practices they are being taught in school and in the workplace by their teachers/mentors/supervisors. I take a hard line: I do not allow my students, or the junior faculty I mentor, to use the "s-word" in my presence.
Coincidentally the summer issue of the Journal of Economic Perspectives was just released and it contains a paper by Guido Imbens titled "Statistical Significance, p-values, and the Reporting of Uncertainty."https://www.aeaweb.org/articles?id=10.1257/jep.35.3.157
Coincidentally the summer issue of the Journal of Economic Perspectives was just released and it contains a paper by Guido Imbens titled "Statistical Significance, p-values, and the Reporting of Uncertainty."https://www.aeaweb.org/articles?id=10.1257/jep.35.3.157
Thanks for sharing that, John. The Cox (2020) article cited on p. 159 does not appear in the reference list, but I assume it is this one:
“Democracy is the worst form of government, except for all the others.” (attributed to Winston Churchill), and p-values are very much the same. Imperfect and messy, but useful.
For responding to your reviewer, (echoing Richard above) I'd move the focus away from a "naked" p-value and over to the practical implications of your (presumably significant) coefficient. You could use Cohen's d (or other effect size metric) to then discuss the magnitude of your results.
__________________________________________________ __
Assistant Professor, Department of Biostatistics and Epidemiology
School of Public Health and Health Sciences
University of Massachusetts- Amherst
“Democracy is the worst form of government, except for all the others.” (attributed to Winston Churchill), and p-values are very much the same. Imperfect and messy, but useful.
As far as I know, ASA didn’t reject p-values, but their use to (wrongfully) separate into “significant” and “non-significant”.
I am surprised to see an earlier poster lose trust in ASA because it wanted to solve the mess that “statistical significance” has caused and continues to cause. Someone is brave enough to push science forward; I think we should celebrate that.
But contrary to democracy, we have better alternatives than p-values: confidence intervals (which probably should be called compatibility intervals), or even better: Bayesian credibility intervals (which actually imply what we wrongfully assume that the traditional confidence interval means).
P-values are still poorly understood. StataCorp might look into this problem in their presentations.
The p-value says nothing about the probability of H0 (in fact, in most sciences, H0 in the meaning of a nill-hypothesis is never true).
The p-value is meant to answer this question: If H0 is true (it’s not), how large is the probability of finding the current point estimate or group difference (or a larger association/difference).
The slide below is from one of this years conferences, a talk given by a representative of StataCorp.
Although not directly about the interpretation of p-values, I have a slightly related grumble about the notation in one of StataCorp's FAQs: FAQ: One-sided tests for coefficients
The following representation of the null hypothesis in that FAQ is not quite correct:
Code:
Ho: coef <= 0
Under the null hypothesis, we postulate that the coefficient is equal to zero and derive the distribution of the test statistic accordingly. The one-sided nature of the test is reflected in the alternative hypothesis.
Comment