Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting p-values from Kruskal-Wallis test

    Hello,

    I have ~300 continuous variables and I need to perform a kruskal-wallis test examining them across a 4-level variable. I have written a loop that simultaneously runs the test for all of them, but what I really want to do is output the p-values from all of the tables in order to easily filter them. I have tried to get the scalars from Kruskal-wallis to output the p-value, but there are only 3 listed, none of which are the p-value.

    Code:
    local kruskal var var2 var3 var4 var5 var6 foreach var of local kruskal {
     kwallis `var', by(sptb_cst) 
    }
    How would I pull the p-values out of this?
    Thanks!
    Clare

  • #2
    Consider this

    Code:
    . sysuse census
    (1980 Census data by state)
    
    . kwallis medage, by(region)
    
    Kruskal-Wallis equality-of-populations rank test
    
      +--------------------------+
      | region  | Obs | Rank Sum |
      |---------+-----+----------|
      | NE      |   9 |   376.50 |
      | N Cntrl |  12 |   294.00 |
      | South   |  16 |   398.00 |
      | West    |  13 |   206.50 |
      +--------------------------+
    
    chi-squared =    17.041 with 3 d.f.
    probability =     0.0007
    
    chi-squared with ties =    17.062 with 3 d.f.
    probability =     0.0007
    
    . return list
    
    scalars:
               r(chi2_adj) =  17.06211085869512
                     r(df) =  3
                   r(chi2) =  17.0408088235294
    The P-value is not stored but you can calculate it

    Code:
    di chi2tail(r(df), r(chi2))
    .00069321

    Comment


    • #3
      Thank you so much, Andrew! That worked. My only question after examining a Stata manual on the chi2tail command is -- is the chi2tail test a one-sided test? If I'm interested in a two-sided p-value (I don't have a hypothesis about which direction my medians will be differing) would I then run the code:
      Code:
      di chi2(r(df), r(chi2))
      I got this code from reading this in the Stata manual:

      HTML Code:
      Chi-squared p = chi2(df, x) d = chi2den(df, x) q = chi2tail(df, x) x = invchi2(df, p) x = invchi2tail(df, q)
      HTML Code:
      2. The left-hand-side notation is used to assist in interpreting the meaning of the returned value: d = density value pk = probability of discrete outcome K = Pr(K = k) p = left cumulative = Pr(−infinity < statistic ≤ x) (continuous) = Pr (0 ≤ K ≤ k) (discrete) q = right cumulative = 1 − p (continuous) = Pr (K ≥ k) = 1 − p + pk (
      Thanks!
      Clare

      Comment


      • #4
        Another problem I'm having is that I can't get that code to run through the loop -- I can't seem to find a way to output all the calculated p-values into one document. The following code simply outputs the Kruskal-Wallis test for each variable into my results window, but doesn't create anything in Word.

        Code:
        foreach var of local kruskal {
        kwallis `var', by(sptb_cst)
        asdoc di chi2tail(r(df), r(chi2))
        }

        Comment


        • #5
          The answer to #3 is, I believe, not so: consider what you are asking here.

          A chi-square statistic is bounded below by 0. The two tails of a chi-square distribution are therefore (1) agreement so poor, and chi-square large, so that the null is in doubt (2) agreement so good, so chi-square is near zero, and so that agreement may be "too good to be true", a phrase that is more than proverbial, as it can be found in the statistical literature too.. (2) does fairly often arise -- one scenario is even fraud or at least selection of results that were deemed suitable by those who collected the data -- but I've not seen cases where both tails arise at once.

          Kruskal-Wallis treats some groups higher than others as exactly equivalent to, as it were, other groups higher than some. So, in short, only a one-tailed test arises in your case, and if you need or want to look at detail on the groups, you need something else, say a focused graph or descriptive statistics.

          In #4 you are invoking asdoc, a versatile command I have never used. It is from SSC, as you are asked to explain (FAQ Advice #12: please explain where community-contributed commands you refer to).

          The authority on asdoc is naturally Attaullah Shah who wrote it, but what you are feeding to it is

          Code:
          di chi2tail(r(df), r(chi2))
          which is an instruction to print one number. So why does asdoc enter here at all? I am pretty clear that asdoc will do nothing else with this syntax with the results of your main Kruskal-Wallis output.

          Different notes:

          1. 300 or so significance tests raises a problem of multiplicity, which you will need to address. If you're more than aware of that, fine.

          2. I think the implication of ties is that you should use the statistic adjusted for ties.
          Last edited by Nick Cox; 25 Oct 2019, 12:11.

          Comment


          • #6
            Nick Cox has given you an excellent answer to your first question. If you want to export the P-values to MS Word, you can first install esttab from SSC by Ben Jann.

            Code:
            ssc install esttab
            Here is a reproducible example using my data example. You just need to adapt it to your case.

            Code:
            sysuse census
            local kruskal "marriage medage divorce"
            local n: word count `kruskal'
            
            local i=1
            foreach var of local kruskal{
            kwallis `var', by(region)
            local pvalue: display %05.4f chi2tail(r(df), r(chi2))
            mat r`i'=  `pvalue'
            local ++i
            }
            
            forval j=2/`n'{
            local all "`all' \r`j'"
            }
            mat R= r1`all'
            mat rownames R = `kruskal'
            mat colnames R= "P-value"
            esttab mat(R) using myfile.rtf, nomtitle
            For my 3 variables, this should result in the following table

            Code:
            . esttab mat(R), nomtitle
            
            -------------------------
                              P-value
            -------------------------
            marriage            .2521
            medage              .0007
            divorce             .3577
            -------------------------

            Comment


            • #7
              Thanks to Nick Cox for tagging me. Using Andrew Musau code, here is how to output the matrix with asdoc
              Code:
              sysuse census
              local kruskal "marriage medage divorce"
              local n: word count `kruskal'
              
              local i=1
              foreach var of local kruskal{
              kwallis `var', by(region)
              local pvalue: display %05.4f chi2tail(r(df), r(chi2))
              mat r`i'=  `pvalue'
              local ++i
              }
              
              forval j=2/`n'{
              local all "`all' \r`j'"
              }
              mat R= r1`all'
              mat rownames R = `kruskal'
              mat colnames R= "P-value"
              
              * And now output the matris with asdoc to Word
              asdoc wmat, mat(R)
              Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	20.9 KB
ID:	1522085

              Regards
              --------------------------------------------------
              Attaullah Shah, PhD.
              Professor of Finance, Institute of Management Sciences Peshawar, Pakistan
              FinTechProfessor.com
              https://asdocx.com
              Check out my asdoc program, which sends outputs to MS Word.
              For more flexibility, consider using asdocx which can send Stata outputs to MS Word, Excel, LaTeX, or HTML.

              Comment


              • #8
                Attaullah Shah, @NickCox, and Andrew Musau, thank you so much for your comments! They all helped. I got Attaullah's asdoc code to work. It was helpful to think about the benefit of using a Kruskal-Wallis test here. I am doing exploratory analyses on biomarkers however, so running significance tests on many at one time was my goal.

                Thanks again!!
                Clare McCarthy

                Comment

                Working...
                X