Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to compute p-values from bootstraping

    I am using the Current Population Survey (CPS) to compute a complex statistic that involves computing flows of people moving from employment to unemployment and then using those flows to compute the statistic I'm interested in.

    I need to bootstrap to compute confidence intervals, which I know how to do. I create bootstrap samples of the original individual-level data, compute flows, and compute the statistic I am interested in. Out of these statistics, I look at the 2.5% and 97.5% percentiles. Great.

    If I want to have the p-value of the statistic I am interested in being bigger than 0, can I just look at the percentile associated with the value 0? In other words, look at the mass of statistics that is above 0? I've read I need to shift the bootstrapped observations to have a distribution with a mean 0. If that is the case, I don't know how to do it given that my original sample contains individual-level data.

    Thanks you!


  • #2
    It is a bit more complicated than that. You need to modify the data to make the null hypothesis true. Than you repeatedly sample from the modified data, compute your statistic in each sample, and compute the proportion of samples for which the statistic is larger than the one you observed.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      The short version of my comment would be: Look at the -permute- command, which produces permutation-based p-values, and see if you can use it in a way that is appropriate to your sampling design. To my understanding, -permute- does not provide for cluster samples, but it does recognize stratification. There may be ways to implement clustering in a permutation test, but I don't know.

      The theoretical point here - often too-little appreciated -- is that bootstrapping simulates the sampling distribution of the statistics of interest that would occur if the alternative hypothesis is true, as estimated by the distribution in the sample. For a p-value, though, you need to know the distribution that would occur if the null hypothesis is true (per Maarten's comment), which is not not likely to be characterize the sample. Using the standard errors from a bootstrap procedure to make (say) a t-test would be conceptually incorrect, although it might produce similar results to a permutation test. Someone more knowledgeable than I am could comment on the situations in which those two sampling distributions would be similar. The purer answer, though, is that you if you want a resampling-based p-value, you need some kind of permutation test.

      Personally, I'd be happy with the CI, but I understand that people of the p-value culture would not be satisfied.

      Comment


      • #4
        Arnau:
        as an aside to previous helpful comments, dating back to many years ago, when bootstrap seemed a one-size-fits-all remedy in many research fields, I was perticularly fond of this article: Desgagné A, Castilloux AM, Angers JF, LeLorier J. The use of the bootstrap statistical method for the pharmacoeconomic cost analysis of skewed data. Pharmacoeconomics. 1998 May;13(5 Pt 1):487-97. I do hope that you can find it useful.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment

        Working...
        X