Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Testing for difference in means across groups with large samples

    Hello all,

    I am trying to test for the difference in means across groups (10 groups to be more precise) for certain variables.
    I have a very large sample (more than 1 million observations) and I have read, both here and in papers, that there normal t-tests are not appropriate in such cases.
    I have also seen some suggestions to use the command cobval, but it seems that command only works when the variable defining the groups only takes two values, which is not my case.
    What is then the best way to test in stata for difference in means across multiple groups, accounting for sample size?

    Many thanks

  • #2
    Tests are still "correct" in large samples, they just don't answer a question you care about: The null hypothesis is never strictly true, so if you look hard enough you will always find some minor deviations. Having a really large sample corresponds to looking really hard, so it comes as no surprise that in large samples you will always reject the null hypothesis. This is not wrong; it is the correct answer to the question you posed when using that test. That is fine. The purpose of a statistical test is much more limited than many people think. Statistically significant is not synonymous with "scientific" or "important". Instead a test is there to deal with one limited but important problem: We humans are very good at imagining a pattern in random noise (think of a Rorschach test https://en.wikipedia.org/wiki/Rorschach_test ). The only purpose of a statistical test is to protect us from that. So the real analysis happens when we interpret the coefficients (in your case means), and the statistical tests only serve as a minor safety step. In large datasets that step is largely superfluous. So you can do it or not do it, and it probably won't make a difference. It also means that there is no need for an alternative: the test still works as it is supposed to. It is (as always) up to you to determine whether you think those differences are large enough to be substantively meaningful, and that is the real analysis.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Dear Francisco Nobre,

      Adding to Maarten's helpful comments, I would say that an issue to consider is the fact that the level of significance used should be a function of the sample size.

      The standard approach to statistical test fixes the probability to reject a true null at a low level (e.g., 5%), because we treat the type-1 error as more serious than the type-2 error, and therefore want to protect the null. In small samples, this may lead to low power, but that is considered preferable to an incorrect rejection of the null.

      Keeping the size of the test fixed (say, at 5%) when the sample increases, reduces the probability of a type-2 error. For very large samples, as in your case, we end up with a probability of type-2 error virtually equal to zero, while keeping the probability of the more serious type-1 error constant, which goes against the spirit of the procedure that is based on the idea that the null should be protected.

      The solution for this is to be more stringent in large samples. Suitable critical values adjusted for the sample size, based on Bayesian arguments, are provided by Ed Leamer (see page 114), and are also implicitly used in the BIC.

      Best wishes,

      Joao

      Comment


      • #4
        Many thanks to both!

        Comment

        Working...
        X