Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quartile calculation is different from Excel? And trouble calculating quartiles by groups.

    Hi there,

    I am using Stata to calculate the median, 1st quartile, and 3rd quartile and then output to excel. It is part of a larger loop, so some of the variables are unique to the loop (`row', `q50', `q25, q`75')

    The code I am using:
    Code:
    local row = 2
    if d_HGBL == 1{
        putexcel set results_dd.xlsx, sheet (test) modify
        summarize HSCRP, detail
        local q50 = r(p50)
        putexcel B`row' = "Increase group Median =", right
        putexcel C`row'= `q50'
        local row = `row'+1
        local q25 = r(p25)
        putexcel B`row' = "Q25% =", right
        putexcel C`row'= `q25'
        local row = `row'+1
        local q75 = r(p75)
        putexcel B`row' = "Q75% =", right
        putexcel C`row'= `q75'
    }
    else{
        local row = `row'+3
        summarize HSCRP, detail
        local q50 = r(p50)
        putexcel B`row' = "Decrease group Median =", right
        putexcel C`row'= `q50'
        local row = `row'+1
        local q25 = r(p25)
        putexcel B`row' = "Q25% =", right
        putexcel C`row'= `q25'
        local row = `row'+1
        local q75 = r(p75)
        putexcel B`row' = "Q75% =", right
        putexcel C`row'= `q75'
    }
    The code only partially works. It calculates the median, 25%-tile, 75%-tile for the whole data set. But the data set is split into two groups defined by the variable d_HGBL. 1 is the Increase group and -1 is the Decrease group. Stata seems to be correctly ignoring the values that are empty.

    My output to Excel:

    Click image for larger version

Name:	Presentation1.jpg
Views:	1
Size:	1.70 MB
ID:	1729313


    The data set for HSCRP. The data is
    Code:
    . dataex HSCRP d_HGBL
    
    ----------------------- copy starting from the next line -----------------------
    
    
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input double HSCRP float d_HGBL
       . -1
       . -1
     2.1 -1
       1 -1
       . -1
     3.8 -1
      .7 -1
     1.5 -1
      .2 -1
      .5 -1
       . -1
     2.2 -1
     1.8 -1
     8.5 -1
      .7 -1
     1.1 -1
      .4 -1
       1  1
     2.9  1
       .  1
      .3  1
       .  1
     2.9  1
       .  1
     1.5  1
       .  1
       .  1
     1.7  .
    14.7  .
       .  .
    end
    label values d_HGBL d_HGBLlab
    label def d_HGBLlab -1 "Decrease", modify
    label def d_HGBLlab 1 "Increase", modify
    ------------------ copy up to and including the previous line ------------------ Listed 30 out of 30 observations
    Regardless, Excel calculates the third quartile as 2.725 with QUARTILE.EXC and 2.375 as QUARTILE.INC.

    It seems I am not the only one who has issues with Excel calculating the third quartile:
    https://stackoverflow.com/questions/...ed-calculation
    https://superuser.com/questions/3433...on-doesnt-work
    https://www.reddit.com/r/excel/comme...tile_in_excel/

  • #2
    There are many different rules for calculating quantiles. You may have direct access to https://www.jstor.org/stable/2684934 and in any case copies of that paper can be found easily. IIRC, the paper doesn't get as far as bringing weights into the calculation, which Stata does allow.

    I am not a great fan of Excel for statistical calculation but I am optimistic that Excel will be found to use one of the documented rules. The differences between results for different rules can be especially noticeable for very small datasets.

    Comment


    • #3
      Thank you Nick!

      Any idea why my code is not working?

      There are two groups in dd_HGBL. I am trying to get the median, 25% percentile (1st quartile), and 75% percentile (3rd quartile) for HSCRP for each of the two groups in dd_HGBL. the dd_HGBL designation is "1" or "-1".

      Thanks for any tips!

      Comment


      • #4
        Your code is giving you incorrect results because the entire
        Code:
        if d_HGBL == 1 {
            // do stuff
        }
        else {
            // do stuff
        }
        does not do what you think it does, and is not appropriate to your problem. What it does is check the value of d_HGBL in the first observation in the data set. If that value is 1, then the quartiles for the entire data set are calculated and sent to Excel starting in row 2. If the value of d_HGBL in the first observation in the data set is not 1, then it skips to the -else- block, which then calculates the quartiles for the entire data set and sends them to Excel starting in row 7.

        What you need is:
        Code:
        local row = 2
        putexcel set results_dd.xlsx, sheet (test) modify
        summarize HSCRP if d_HGBL == 1, detail
        local q50 = r(p50)
        putexcel B`row' = "Increase group Median =", right
        putexcel C`row'= `q50'
        local row = `row'+1
        local q25 = r(p25)
        putexcel B`row' = "Q25% =", right
        putexcel C`row'= `q25'
        local row = `row'+1
        local q75 = r(p75)
        putexcel B`row' = "Q75% =", right
        putexcel C`row'= `q75'
        
        local row = `row'+3
        summarize HSCRP if d_HGBL == -1, detail
        local q50 = r(p50)
        putexcel B`row' = "Decrease group Median =", right
        putexcel C`row'= `q50'
        local row = `row'+1
        local q25 = r(p25)
        putexcel B`row' = "Q25% =", right
        putexcel C`row'= `q25'
        local row = `row'+1
        local q75 = r(p75)
        putexcel B`row' = "Q75% =", right
        putexcel C`row'= `q75'
        For a thorough explanation of the different -if-s in Stata and when and how to use each, see Cox NJ & Schechter CB, Stata tip 152: if and if: When to use the if qualifier and when to use the if command. The Stata Journal (2023) 23(2) 589-94, available at https://journals.sagepub.com/doi/abs/10.1177/1536867X231175349?journalCode=stja.

        Comment

        Working...
        X