Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding percentiles by year to a graph

    Dear Stata experts,

    I have a dataset including company returns over 16 years and want to graphically describe them using a graph with the yearly mean of company returns as well as some percentiles to give an additional indication of the distribution of the values. I added an image below to underline what should be displayed. I found calculating the mean easy, but did not find a way to calculate the yearly percentiles (ideally using an additional if condition to select one specific country).

    I am new to stata and managed to develop the following code so far. Can I somehow export those values to Excel and use them there, or even save and add those percentiles calculated into a graph in stata?

    forval Year=2000(1)2016 {
    display `Year'
    tabstat RoA if Year==`Year'&country=="DK", statistics(count p50 p25 p75)
    }
    Output:
    2014

    variable | N p50 p25 p75
    -------------+----------------------------------------
    RoA | 74 .7773741 .2041142 1.163145
    ------------------------------------------------------
    2015

    variable | N p50 p25 p75
    -------------+----------------------------------------
    RoA | 69 .866985 .366647 1.258994
    ------------------------------------------------------
    2016

    variable | N p50 p25 p75
    -------------+----------------------------------------
    RoA | 68 1.039269 .4520355 1.403584
    ------------------------------------------------------



    Click image for larger version

Name:	time series with percentiles.PNG
Views:	1
Size:	23.8 KB
ID:	1442333


    Thank you a lot for the support and sharing your knowledge!

  • #2
    You could use table and then "Copy as table" to excel. You can also use table's replace option, which will replace your dataset with the table (remember to preserve/snapshot!) and then export excel or similar commands.

    Code example:
    Code:
    clear all
    webuse nlswork
    table year if c_city == 1, c(p25 ln_wage p50 ln_wage p75 ln_wage) replace

    Comment


    • #3
      You can do all this in Stata without troubling any other program, or being troubled by it.

      This is a small twist on an example in the help of rangestat (SSC), which you must install before you can use it. Similarly you must install moremata (SSC) first.

      For a recent example see https://www.statalist.org/forums/for...ogram-in-stata

      For others, search the forum for e.g. rangestat quantile as keywords.

      A more prosaic alternative is just to use the corresponding egen functions.

      I have not understood how you intend to plot counts on the same graph, but the code calculates them too.

      Code:
      webuse nlswork, clear
      
      * -ssc inst moremata- needed for -mm_quantile()-
      mata:  
          mata clear
          real rowvector myquantile(real colvector X) {
          if (rows(X) < 3) return(.)
          return(mm_quantile(X, 1, (0.75, 0.5, 0.25)))
          }
       end
      
      * -ssc inst rangestat- needed first
       rangestat  (mean) ln_wage (count) ln_wage (myquantile) ln_wage, interval(age 0 0)
      
       label var ln_wage_mean "mean"
       label var myquantile1 "p75"
       label var myquantile2 "p50"
       label var myquantile3 "p25"
              
      set scheme s1color
      twoway connected ln_wage_mean myquantile? age, sort legend(col(1) pos(3)) ///
      ytitle("`: var label ln_wage'") yla(, ang(h)) xla(15(5)45) ms(+ Oh ..)
      Click image for larger version

Name:	quantile_mean.png
Views:	1
Size:	36.0 KB
ID:	1442349


      Last edited by Nick Cox; 01 May 2018, 14:14.

      Comment


      • #4
        Thank you Ariel and Nick!


        @Nick: Your approach seems very smooth and advantageous. However, I fail to get the rangestat command working when trying to replicate the results in my project. Stata creates new variables but without observations it it.

        mata:
        mata clear
        real rowvector myquantile(real colvector X) {
        if (rows(X) < 3) return(.)
        return(mm_quantile(X, 1, (0.75, 0.5, 0.25)))
        }
        end

        rangestat (mean) RoA (count) RoA (myquantile) RoA if country=="DK", interval(Year 2000 2016)
        Do you have an idea what I'm doing wrong?
        Last edited by Hendrik Tacke; 01 May 2018, 15:32.

        Comment


        • #5
          The syntax

          Code:
          interval(year 2000 2016) 


          doesn't do what you think. It calculates for the interval [year + 2000, year + 2016] and there are no such values in your dataset. The numbers given are offsets, not values. That is why my code was

          Code:
          interval(age 0 0) 


          and not

          Code:
          interval(age 14 46) 


          to get summaries for each distinct (integer) age.

          Comment


          • #6
            Perfect, thank you! I tried that yesterday as well but got the following error:

            mm_quantile(): 3351 argument has missing values
            mm_quantile(): - function returned error
            myquantile(): - function returned error
            do_flex_stats(): - function returned error
            <istmt>: - function returned error

            I just fixed it by adding
            if RoA!=.
            to rangestat today to ignore those missing values (I hope this does not change the results). The output looks great!

            Comment


            • #7
              Thanks for the signal. The code needs an extra line as a vector with 3 or more values is necessary but not sufficient. Something like

              Code:
              real rowvector myquantile(real colvector X) {
                  select(X, !rowmissing(X))
                  if (rows(X) < 3) return(.)
                  return(mm_quantile(X, 1, (0.75, 0.5, 0.25)))
              }
              Your work-around is equivalent.

              Comment

              Working...
              X