Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • The use of p-values has been criticized by the American Statistical Association.

    Hi everyone,

    My research area is economic and I use panel data of listed firms in the USA from 2000 - 2020 for my research. For my model, I use -xtreg y x, fe (cluster id)-

    My reviewer wrote that: "The use of p-values has been criticized by many organizations, including the American Statistical Association. Because of this, after several past years' warning, this year, the organizing committee decided not to include any paper using p-values. Please revise your paper by using reliable alternative techniques."

    So, I'm very confused because all economic journals still use p-value!

    Anyone here can give me some reasoning to respond to this feedback and what are reliable alternative techniques?

    Regards
    --------------------
    (Stata 15.1 MP)

  • #2
    The American Statistical Association's position (of which I am a strong supporter) is set out in https://www.tandfonline.com/doi/full...5.2019.1583913 and https://www.tandfonline.com/toc/utas20/73/sup1. The former is a summary statement, and the latter is a compendium of 43 supporting papers, many of which offer alternative approaches, some of which may be suitable for your paper.

    I, for one, am happy to learn that this is finally making some penetration into economics. I am sure there are others on this Forum who will, however, disagree. The issue remains controversial, and I do not foresee resolution anytime soon.

    Comment


    • #3
      Personally, I think it is rather unfair to give such a feedback as a reviewer without giving any further guidance. There is such a huge amount of alternatives to pick from that I am not sure you can even please all reviewers with any given solution (since some still want to see p-values). That being said, the probably most popular alternative are confidence intervals. These are reported for each coefficient in your xtreg results table and do not require any further computations to what you have already done.
      Best wishes

      (Stata 16.1 MP)

      Comment


      • #4
        Linh:
        while I totally agree with Clyde here (and in many other posts, in fact), whenever a methodological upheaval creeps up the risk is that hardliners take control and, in this case, reject p-value altogether.
        Even though I'm far from being a p-value fan (most of the times I find statistically significant results as informative as their non-significant counterparts, as the intriguing part of the game is trying to understand the reasons underlying these opposite situation) and I read with a lot of interest the articles that Clyde mentioned, I would support debating this issue inside each scientific society/research field and drafting guidelines concerning 95% CIs, p-value and alternative approaches, also in the light of the trivial evidence that research fields differ in many respects.
        As far as your reviewer's advice is concerned, you can probably replace p-value with 95% CIs.
        However, I find difficult to replace p-value with 95% CIs for statistical procedure that are the building blocks of panel data econometrics, such as -hausman- test.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          However, I find difficult to replace p-value with 95% CIs for statistical procedure that are the building blocks of panel data econometrics, such as -hausman- test.
          I am not an economist but I feel that p-values are especially overrated in procedures such as the hausman test (which, as I understand is routinely used to test the consistency of fe and re specifications). Instead of looking at some omnibus p-value, which is based on all coefficients and is additionally arbitrarily dichotomized, look at the differences of the coefficients of interest in the respective models. Are the differences in these coefficients substantially meaningful? If so, you might want to give those differences more thought. If not, why should you care about the (overall) differences being "statistically significant" or not?

          Comment


          • #6
            You have sent your paper to a not-very good journal, and very predictably you have gotten a not-very good referee.

            1. If you really want to publish your paper in such a not-very good journal, and
            2. If this is the only problem what this not-very good referee has with your paper

            then try to please them. Try to figure out what they want from you, some journals have this (in my view idiotic policy) of prohibiting p-values described on their website, or in a statement by their editors. If they do not have it as an official policy, drop a line to the editor and see what instructions they will give you.

            Everybody has an axe to grind, and some people with their axes to grind are particularly aggressive and obnoxious. The American Statistical Association foolishly allowed such aggressive and obnoxious people with their axes to grind to run amok in their "official statements on p-values". I personally lost of a lot of respect for the American Statistical Association in the course of this p-values fiasco.

            Daniel Klein makes a very good point, but this is not a point against p-values, as it seems to me that he thinks.
            1. In econometrics we generally prefer p-values over "dichotomising" into significant and insignificant. I really do not feel that at any point we had raised on pedestal some arbitrary threshold such as 5%, as some other less enlightened disciplines have done.
            2. In every good book on econometrics somewhere sooner or later is explained something in the lines that "a highly statistically significant mouse might not be that interesting, and a statistically insignificant elephant might be very interesting." So in econometrics very generally we care about sizes of estimates effects, apart from statistical significance.
            3. The proposal in a Hausman test to test only the coefficient(/s) of interest appears in Wooldridge, Jeffrey M. Econometric analysis of cross section and panel data. MIT press, 2002. On p.120 Wooldridge explains how you can construct a Hausman t-statistic.






            Comment


            • #7
              I don’t mind p values and I think they can often be helpful. My big complaint is when papers only talk about sign and statistical significance of effects and do not say much about substantive significance, That is why I am a big fan of margins or other techniques which can help to make the substantive significance much clearer.

              i like this book:

              https://www.amazon.com/Understanding.../dp/041587968X
              -------------------------------------------
              Richard Williams, Notre Dame Dept of Sociology
              StataNow Version: 18.5 MP (2 processor)

              EMAIL: rwilliam@ND.Edu
              WWW: https://www3.nd.edu/~rwilliam

              Comment


              • #8
                It has always struck me as odd that the null hyp is the default winner. The costs of a mistake should be factored in. For example, gender discrimination in earnings is a serious thing. But if the indicator of discrimination is “only” significant at the .06 level, we would conclude there is no proof of discrimination. Or at least, some would.

                Conversely, suppose a study found women make $100 a year less than men. If the sample is large enough that difference might be significant at the .01 level. But still, is a difference of $100 a year substantively significant, especially when you consider the data and the model probably aren’t perfect anyway?
                -------------------------------------------
                Richard Williams, Notre Dame Dept of Sociology
                StataNow Version: 18.5 MP (2 processor)

                EMAIL: rwilliam@ND.Edu
                WWW: https://www3.nd.edu/~rwilliam

                Comment


                • #9
                  And it turns out that the views in economics and sociology are not that different, because most (good empirical) economists would agree with everything Richard Williams is saying.

                  Indeed raising on pedestal some arbitrary threshold such as 5% significance level and talking only about signs and significance (and there is even worse version, when people talk only about sign, without even significance) are anti-scientific practices which must be unrooted.

                  The p-value has nothing to do with these anti-scientific practices, and the p-value is a very useful statistic.

                  At the end, the consolidated position of the American Statistical Association is very reasonable, and of course I agree with it. It is just that it is a trivial Mickey Mouse point that anybody who has taken a decent course on the basics of statistics, or econometrics should know:
                  "Assuming that the null hypothesis plus the auxiliary assumptions needed to derive the distribution of the test statistic hold, the p-value is the probability of observing as extreme or more extreme statistic as the statistic actually observed."
                  In the consolidated position they conclude that many people put other incorrect meaning on the p-value... Well, this is not a problem of the p-value, that people refuse to learn as trivial statement as the definition of a p-value.

                  So the consolidated position is fine, the only problem is that it is a trivial point (people misunderstand p-values). And the problem comes after that when people like the referee in the Original Post misrepresent this position, in incorrectly imply that the Association has said that we must not use p-values. The Association has not said anything like this in the consolidated statement. What they are saying is that whoever has not learnt the definition of a p-value yet, should learn it.

                  I have read Geoff Cumming's book cited in #7. It is a good book. Geoff Cumming advocates confidence intervals over p-values. But the point is a bit behavioural in this case, because p-value and confidence interval carry the same information. If you give me the p-value, I can calculate the standard error, and construct the confidence interval.

                  I like confidence intervals because they focus on the parameter space. This is good. Still I prefer p-values.

                  Comment


                  • #10
                    Originally posted by Richard Williams View Post
                    It has always struck me as odd that the null hyp is the default winner.
                    But you can define the null- and alternative-hypothesis pair anyway that you want, construct the pair in a manner that makes the most sense for the research question at hand. For example, there's a fairly extensive statistical (and pharmaceutics) literature behind the practice of so-called bioequivalence testing in the pharmaceutical industry in which the default scientific hypothesis is that two formulations of a medication or two sources of a biologic drug are different in some substantive manner, and the study must be powered to detect the case in which they are substantively the same.

                    Originally posted by Joro Kolev View Post
                    You have sent your paper to a not-very good journal, and very predictably you have gotten a not-very good referee.
                    I got the impression that it isn't a paper submission to a journal, but rather an abstract submission to an annual meeting for a talk or a poster session. (". . . this year, the organizing committee decided . . . I'm very confused because all economic journals still use p-value!")

                    Comment


                    • #11
                      Economists have been aware of this issue at least since the 1985 paper by D. McCloskey:

                      The Loss Function Has Been Mislaid: The Rhetoric of Significance Tests, The American Economic Review, Vol. 75, No. 2, Papers and Proceedings of the Ninety-Seventh Annual Meeting of the American Economic Association (May, 1985), pp. 201-205.

                      This brilliant and seminal paper is a real pleasure to read and I am sure many in this forum will enjoy it.

                      Comment


                      • #12
                        Re: Richard Williams in #8: P. 297 (bottom of first column) in this paper by Charles Manski is a nice, concise statement endorsing this viewpoint. https://www.tandfonline.com/doi/full...5.2018.1513377

                        For those who may not have access to the paper, here is the excerpt:
                        ...there are several reasons why hypothesis testing may yield unsatisfactory results for medical decisions and other forms of treatment choice.

                        These include

                        1. Use of Conventional Asymmetric Error Probabilities: It has been standard to fix the probability of Type I error at 5% and the probability of Type II error at 10–20%. The theory of hypothesis testing gives no rationale for selection of these conventional error probabilities. It gives no reason why a clinician concerned with patient welfare should find it reasonable to make treatment choices that have a substantially greater probability of Type II than Type I error.

                        2. Inattention to Magnitudes of Losses to Welfare When Errors Occur: A clinician should care about more than the probabilities of Type I and II error. He should care as well about the magnitudes of the losses to patient welfare that arise when errors occur. A given error probability should be less acceptable when the welfare difference between treatments is larger, but the theory of hypothesis testing does not take this into account.
                        Last edited by John Mullahy; 24 Jul 2021, 07:54. Reason: added excerpt's text

                        Comment


                        • #13
                          Off-topic here but hopefully funny.
                          As John mentioned Charles "Chuck" Mansky, I've had the pleasure to excange some emails with during this very unfortunate pandemic.
                          He was particularly sad as he could not visit Italy.
                          He fell in love with Bergamo (
                          https://en.wikipedia.org/wiki/Bergamo
                          ) - that was severly plagued by the pandemic last year.- during one of his trip to Italy and became a foreign supporter of the local football team, Atalanta (
                          https://en.wikipedia.org/wiki/Atalanta_B.C.
                          ).
                          Kind regards,
                          Carlo
                          (StataNow 18.5)

                          Comment


                          • #14
                            Regarding Joao Santos Silva in #11 and John Mullahy in #12 (the second point by Charles Manski):

                            Everybody agrees (or would agree if you explain to them the issues as Manski, and McCloskey and Ziliak have done) that welfare/utility losses is a much superior way to assess uncertainty compared to the p-values and the hypothesis testing.

                            The statistician who first pushed for these welfare/utility assessments and made some progress on this front is Abraham Wald (the same guy from the Wald test). D. McCloskey is citing Abraham Wald (I have not read this particular paper Joao cites, but I have read a couple of papers by McCloskey and Ziliak including their influential survey McCloskey, D. N., & Ziliak, S. T. (1996). The standard error of regressions. Journal of economic literature, 34(1), 97-114.)

                            Charles Manski also cites Wald, A. (1950), Statistical Decision Functions, New York: Wiley.

                            At least one paper of the 43 that Clyde mentioned in the American Statistical Association collection on p-values discusses these statistical decision functions, I think.

                            So then why don't we use these Statistical Decision Functions?

                            1. Things get complicated with these statistical decision functions pretty fast. People do not want to memorise the one sentence definition of a p-value, and we are expecting these same people to understand statistical decision functions...

                            2. There is a fundamental problem that we do not know peoples preferences and people's utility functions. So when we do not know the utility or welfare function, we cannot calculate utility or welfare losses.

                            Overall it is easy to explain the problems with hypothesis testing and p-values, but it turns out that it is very hard to come up with something which is better, and yet reasonably simple to implement. And this is why I think that the 43 or so papers in the symposium on p-values were a fiasco, because everybody was grinding his own axe and was pushing for his favourite topic, be it Bayesian inference or statistical decision functions, or whatever else. And at the end the reader is left with nothing, because they failed to provide an alternative on which they could all agree.

                            Comment


                            • #15
                              Re: #14: I might suggest taking a look at this recent paper in Value in Health by Manski and Tetenov in which they recognize the challenges of the statistical decision approach and discuss notions of "near optimality" that are implementable straightforwardly in applications. https://www.valueinhealthjournal.com...045-0/fulltext

                              We use the concept of near-optimality to evaluate criteria for treatment choice. This concept jointly considers the probability and magnitude of decision errors. An appealing criterion from this perspective is the empirical success rule, which chooses the treatment with the highest observed average patient outcome in the trial.

                              Comment

                              Working...
                              X