Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Checking the significance of regression coefficients for large sample size

    https://davegiles.blogspot.com/2019/...have.html#more
    Based on the above blog post I saw that if you have a large sample size, then your p value threshold should be reduced and one rule of thumb is significance of a regression coefficient (roughly) by using a t-statistic of (ln(N))^.5, Where N corresponds to sample size.
    What is logic behind this. Let us say that I have 18000 observation (1800 units for 10 years) in my panel and I have 10 independent variables in my regression, what should be the threshold value for the t-statistics/

  • #2
    The premise of that idea is given in the Deaton quote,

    Repairing these difficulties requires that the critical values of test statistics be raised with the sample size, so that the benefits of increased precision are more equally allocated between reduction Type I and Type II errors. That said, it is a good deal more difficult to decide exactly how to do so, and to derive the rule from basic principles.
    That being said, I don't see why Deaton is mystified at all,

    The effect most noted by empirical researchers is that the null hypothesis seems to be more frequently rejected in large samples than in small. Since it is hard to believe that the truth depends on the sample size, something else must be going on.......... As the sample size increases, and provided we are using a consistent estimation procedure, our estimates will be closer and closer to the truth, and less dispersed around it, so that discrepancies that were undetectable with small samples will lead to rejections in large samples.
    This is a disturbing quote to me, that more precise estimates are being presented as a problem that we need a complex theoretical solution for. This to me is everything wrong with using p<0.05 as the holy grail of a good/correct analysis that sees large N as some form of soft cheating. Of course he is a Nobel Laureate and I am nobody, but this perspective seems completely at odds with how most people think about greater sample size and the corresponding increase in precision. Deaton writes a lot of strange things about econometrics and program evaluation that Imbens rebuts in a 2010 paper, and since I read that I started heavily discounting whatever I read by Deaton regarding econometrics.

    All of that to say, I think there is no logic/bad logic behind it.

    Comment


    • #3
      I agree with Jackson. P values simply are the probabilities of falsely rejecting H0, and this interpretation never changes with sample size. In other words, whether you think it's appropriate to reject H0 should solely depend on your tolerance of Type I error and has nothing to do with sample size.
      Last edited by Fei Wang; 27 Oct 2021, 19:19.

      Comment


      • #4
        Thanks Fei Wang Jackson Monroe for the time and support. Further googling says something similar for instance
        "When sample sizes run into the thousands, many statisticians decrease the significance threshold to .01." David Dranove (https://www.kellogg.northwestern.edu...n%20basics.pdf)

        "Use a t-test with relatively small samples, not with data for the entire population. With very large samples you can reject almost any null hypothesis. 1 ˆ 1 ˆ b b SE t = and as sample size increases, SE diminishes and t approaches infinity, but this doesn't mean that the true b is not really zero, just that it is a large number of SEs away from zero (but as SEs become very small of this starts to be less important)"
        http://depts.washington.edu/lecturer...s/Week%204.pdf

        So not quite sure whether for large samples we should go for a higher t-stats or not

        Comment


        • #5
          The reason why people (especially non-statisticians) change the significance threshold with sample size is that they want the p-value to do more than it can. Remember that in statistics "significance" is not a synonym of "relevant" or "important". It has only a very limited meaning, as Fei Wang mentioned. It has its uses, but it cannot tell you whether a result is important, big, relevant or not. The p-value is just not designed for that problem. It is a limitation, but not a flaw: you don't blame a hammer for not working well with screws, you blame the workman for trying to use a hammer on screws. People who change the threshold depending on the sample size are trying to use the p-value for something it is not designed for, and as a result they are guaranteed to fail. So don't do that.

          If you want to know more about how to use p-values see: https://amstat.tandfonline.com/doi/p...5.2016.1154108
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Thanks Maarten Buis.
            I think some supporters of the claim that p-value should be adjusted with sample size is based on Leamer (1978) that the level of significance should be set as a decreasing function of sample size (Leamer, E. 1978, Specification Searches: Ad Hoc Inference with Nonexperimental Data, Wiley, New York).
            Similarly in the paper, How to Choose the Level of Significance: A Pedagogical Note (https://mpra.ub.uni-muenchen.de/6637...aper_66373.pdf) by Jae Kim, he argues that The level of significance should be chosen with careful consideration of the key factors such as the sample size, power of the test, and expected losses from Type I and II errors.
            Also, Daniel Lakens argues that "Justify Your Alpha by Decreasing Alpha Levels as a Function of the Sample Size (http://daniellakens.blogspot.com/201...ta-should.html).

            In the excellent resource, you shared it is written that "Any effect, no matter how tiny, can produce a small p-value if the sample size or measurement precision is high enough, and large effects may produce unimpressive p-values if the sample size is small or measurements are imprecise. Similarly, identical estimated effects will have different p-values if the precision of the estimates differs.

            In the blogspot of Dave Giles, the rule of thumb for deciding the t-statistic is given as

            We should reject against a 2-sided alternative hypothesis if |t| > √[n(n1/n - 1)] (where q denotes the number of linear restrictions and n denotes sample size.)

            I think the more I google on this and start reading I get myself puzzled hence may have to find a way out

            Comment


            • #7
              You seem to think that the fact that everything is significant when your sample is large enough indicates that you need to adjust for sample size. That is wrong. So my advise is that you stop googling, and instead read a good introductory statistics textbook and read up on what a p-value exactly means.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                Neelakanda Krishna, the quotes you cite are indicative of the problems Maarten and I have already discussed, treating the p-value like it is some sort of tool to decide whether or not you've conclusively answered your question. If you have more data and a sharp null hypothesis (mu=0, e.g.) you will almost always tend to have a smaller p-value as n increases. The fact that the above authors want to "correct" for that shows that they are treating p<alpha like the goal or the sine qua non of a good study, and that large N is basically "cheating" to them. "Well N=40,000, so of course you rejected the null," is not a valid complaint, and despite my respect for the above authors they are asking the p-value to be more than it is when they do so. The p-value has a well-defined meaning in frequentist statistics, making weird ad-hoc adjustments that have fuzzy/no logic and that the adjusters themselves can't really justify is not best practice.

                Comment


                • #9
                  Dear All,

                  This is a very interesting discussion and I would like to add my two cents. I start by noting that the rule mentioned above (using as the critical value of a two-sided t-test (ln(N))^.5) is not an ad-hoc adjustment: Leamer derives it based on standard and solid Bayesian arguments; indeed, that is exactly the rule implicit in the popular BIC. I also do not agree that Leamer and Deaton see large samples as a form of cheating or as a problem.

                  In my view, the logic of using different sizes of the tests (that is, different probabilities of type-I error) for different sample sizes is as follows. As we all know, traditional tests of hypotheses are based on the idea that rejecting a true null is more serious than not rejecting a false one, so we fix an acceptable level for the type-I error and let the type-II error be whatever it is. The level of type-I error should depend on the particular application, and there is no reason to assume that 0.01, 0.05, or any other ad-hoc choice, is generally a good rule. In deciding the size of the test to use, it is natural to consider what the power of the test might be; after all, there is no point in doing a test with no power.

                  So, if my sample is small, I may have to tolerate a relatively large probability of type-I error, say 0.1, to have a decent power. However, if my sample is very large, that cut-off will lead to a test where the probability of a type-II error is virtually zero, and the probability of the more serious type-I error is still 10%. This goes against the spirit of hypotheses test, and therefore it is reasonable to use a different cut-off with a large sample, so that we reduce the probability of type-I error while still having a decent power. In that sense, the tolerance of type-I error depends on the potential power of the test, and therefore depends on the sample size.

                  In short, I see nothing wrong in considering the characteristics of the problem before deciding what cut-off to use; if anything, I think we should do it more often and not rely on the traditional 0.1, 0.05, and 0.01 rules that were popularized when the sample in use were much smaller that what is typical today. Of course, statistical significance does not mean practical relevance, and we may not like hypotheses tests at all (and in a way p-values are an alternative to hypotheses tests), but if we do an hypothesis test, I see nothing wrong in choosing a probability of type-I error according to the sample we have. Things become murky, however, when observations are not independent and in that case criteria based solely on the sample size lose some appeal.

                  Best wishes and apologies for the long post,

                  Joao

                  Comment


                  • #10
                    Joao thank you for the comment and giving an argument for Deaton's/Leamer's position. Perhaps a concise way to express my issue with it is that it treats the p-value/test size as the main attraction and the hypothesis test as the point of a given statistical operation. I think effect sizes, standard errors, and the plausibility of the model are the most important things, so I see the above adjustments as a sideshow that distract from the actually important task of designing and estimating reasonable models and interpreting their results to help us learn.

                    Framing the issue as a tradeoff between type I and type II errors is more reasonable, but if we're being honest (I think) very few people actually interpret their regressions that way. Focusing on how many stars we are going to put next to our coefficients is what drives some people to say abolish statistical significance, I am not one such person, but when I see the amount of ink spilled over such irrelevant parts of statistical inference I can surely understand their impulse.

                    Comment


                    • #11
                      Dear Jackson,

                      I may be missing something, but your position looks somewhat contradictory: if you care about effect sizes and standard errors, you implicitly care about t-statistics and therefore p-values. Also, if we do not adjust the size of the test when the sample increases, we just put more and more stars next to coefficients, and we agree that in itself is meaningless. So, being more stringent when working will larger samples is a way to avoid reaching false conclusions and finding that almost everything has a statistically significant effect. There is nothing magical about the numbers 0.01, 0.05, and 0.1, so we should think more about what significance levels to use rather than just using some number by default. As you say, many do not do that, and often I am guilty of it myself, but that does not mean we should get rid of a tool that can be very useful when used correctly.

                      Best wishes,

                      Joao

                      Comment


                      • #12
                        I am glad that many joined to help me and others who may have similar doubts. As Maarten Buis pointed my problem was that the sources I referred to gave me view that
                        everything is significant when your sample is large enough indicates that you need to adjust for sample size
                        . Thus as Dave Giles pointed
                        So, if the sample is very large and the p-values associated with the estimated coefficients in a regression model are of the order of, say, 0.10 or even 0.05, then this really bad news. Much, much, smaller p-values are needed before we get all excited about 'statistically significant' results when the sample size is in the thousands, or even bigger
                        Source:https://davegiles.blogspot.com/2011/...data.html#more
                        .
                        Since these are blogs and many posts in them challenge those principles in the textbooks we read, I thought whether this was correct or not hence posted here so that many people in diverse fields can brainstorm.

                        Jackson Monroe put very succinctly that I BELIEVED THAT a large N is basically "cheating", hence results can't be trusted unless authors do some adjustments to the level of significance to accommodate sample size.

                        Infact Joao Santos Silva pointed, Leamer used Bayesian Properties to derive the logic

                        Leamer's result tells us that we should reject the null if F > (n / q)(nq/n - 1) ; or equivalently, if qF = χ2 > n(nq/n - 1)

                        It's important to note that this result is based on a Bayesian analysis with a particular approach to the diffuseness of the prior distribution.
                        Source:https://davegiles.blogspot.com/2019/...have.html#more
                        if anything, I think we should do it more often and not rely on the traditional 0.1, 0.05, and 0.01 rules that were popularized when the sample in use were much smaller that what is typical today.
                        -
                        I think this was the rationale those bloggers were arguing that given the proliferation of huge data sets, the traditional level of cut-off should not be blindly followed.

                        Other references: Lakens, D., 2018. Justify your alpha by decreasing alpha levels as a function of the sample size. The 20% Statistician Blog.

                        Lin, M., H. C. Lucas Jr., and G. Schmueli, 2013. Too big to fail: Large samples and the p-value problem. Information Systems Research, 24, 906-917.

                        From an empirical point of few, I have not seen papers in finance and economics that adjust for sample size though many use large panel data set. Hence the arguments for and against adjusting alpha based on sample size might not have much relevance atleast in some disciplines

                        Comment

                        Working...
                        X