Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can somebody explain how the code from -egen,pctile- below works to calculate a percentile?

    Good afternoon,

    I do not understand how the code from -egen, pctile- works. Can somebody explain what does the portion of code in green do?

    Code:
               mark `touse' `if' `in'
                    gen double `x' = `exp' if `touse'
    
                    sort `touse' `by' `x'
                    tempvar N
                  by `touse' `by': gen long `N' = sum(`x'!=.)
                    local rj "round(`N'[_N]*`p'/100,1)"
                    #delimit ;
                    by `touse' `by': gen `typlist' `varlist' =
                            cond(100*`rj'==`N'[_N]*`p',
                            (`x'[`rj']+`x'[`rj'+1])/2,
                            `x'[int(`N'[_N]*`p'/100)+1]) if `touse' ;
                    #delimit cr
    I cannot relate this code in no way to the description in the Methods and Formulas in -pctile-.

  • #2
    Code:
      by `touse' `by': gen long `N' = sum(`x'!=.)   // create an index of values, within by groups, from 1 to N. This is the equation for N with weights equal to 1 from -pctile-.
      local rj "round(`N'[_N]*`p'/100,1)"           // When used will evaluate to the (closest) index to the requested percentile.
      #delimit ;
      by `touse' `by': gen `typlist' `varlist' =
          cond(100*`rj'==`N'[_N]*`p',               // condition:  requested percentile matches exactly with empirical percentile
          (`x'[`rj']+`x'[`rj'+1])/2,                // if true: average x(i-1) and x(i). This is the top condition for x[p].  `x'[`rj'] is the same as W(i-1).
          `x'[int(`N'[_N]*`p'/100)+1]) if `touse' ; // else:    x(i); the bottom condition for x[p].
      #delimit cr
    Specifically, this implements the default formula desribed in the Methods for pctile, forcing weights to be equal to 1.

    Comment


    • #3
      Thank you very much Leonardo Guizzetti !

      Unfortunately I still cannot make it work for the case with weights. I am benchmarking my results to Ulrich Kohler's -egen, _gwpctile()- from egenmore, and the results are slightly different.

      If anybody can see where I am messing this up, it would be awesome. The program up to the code in red is just housekeeping, so I guess I do not have errors there. But something in the code in red is not doing what it needs to do to replicate the definition from the Methods and Formulas.

      Code:
      *! version 1:   25 April 2021, Joro Kolev
      program define _gwpcti
              version 11, missing
              syntax newvarname =/exp [if] [in] [, p(real 50) BY(varlist) Weights(varname)]
              if `p'<=0 | `p'>=100 { 
                      di in red "p(`p') must be between 0 and 100"
                      exit 198
              }
      
              tempvar touse x myweights
      
              quietly {
                      mark `touse' `if' `in'
                      gen double `x' = `exp' if `touse'
                      gen double `myweights' = cond(!missing(`x'),cond(!missing("`weights'"),`weights'+0,1),.)
      
                      if "`by'"=="" {
                              _pctile `x' if `touse' [aw=`myweights'], p(`p')
                              gen `typlist' `varlist' = r(r1) if `touse'
                              exit
                      }
      
                      sort `touse' `by' `x'
                      tempvar N n
                      by `touse' `by': gen long `N' = sum(`myweights')
                      by `touse' `by': gen `n' = _n if `N'[_n+1]>`N'[_N]*`p'/100
                      sort `touse' `by' `n'
      
                      local rj "round(`N'[1]*`p'/100,1)"
                      
                      by `touse' `by': gen `typlist' `varlist' =    ///
                              cond(100*`rj'==`N'[1]*`p',            ///
                              (`x'[1]+`x'[2])/2,            ///
                              `x'[2]) if `touse'
                 
              }
      end

      Comment


      • #4
        With reference to the Methods and Formulas section of the description of pctile in the [D] PDF, my understanding, not expressed as clearly as I would like, is
        • the local macro `x' corresponds to the values of x sorted in increasing order (within each bygroup of `touse' and `by')
        • the local macro `p' corresponds to the percentile p
        • the temporary variable `N' corresponds to W(i) the cumulative count of non-missing values of x (within each bygroup)
        • the expression in the local macro `rj' evaluates to i-1 - the last index i for which N (W(i)) does not exceed _N*`p' (P) (within each bygroup)
          • if it is exactly equal to P then the percentile is midway between the `x'[i-1] and `x'[i]
          • if it is less than P then the percentile is `x'[i+1]

        Comment

        Working...
        X