Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • limiting data between 1st and 98th percentile

    Hello, I am using
    Code:
    summarize co2eqavg3, detail
    keep if inrange(co2eqavg3, r(p1), r(p98)) | missing(co2eqavg3)
    to limit the values in the co2eqavg3 variable to be between 1st and 98th percentile and keep the missing values. However, when I run this, Stata says: (0 observations deleted). How is this possible? when I limit it to 1st and 99th percentile it removes some values, but when I set the upper limit to 98th percentile, it does not drop anything.

  • #2
    From the output of help summarize

    Code:
    Stored results
    
        summarize stores the following in r():
    
        Scalars   
          r(N)           number of observations
          r(mean)        mean
          r(skewness)    skewness (detail only)
          r(min)         minimum
          r(max)         maximum
          r(sum_w)       sum of the weights
          r(p1)          1st percentile (detail only)
          r(p5)          5th percentile (detail only)
          r(p10)         10th percentile (detail only)
          r(p25)         25th percentile (detail only)
          r(p50)         50th percentile (detail only)
          r(p75)         75th percentile (detail only)
          r(p90)         90th percentile (detail only)
          r(p95)         95th percentile (detail only)
          r(p99)         99th percentile (detail only)
          r(Var)         variance
          r(kurtosis)    kurtosis (detail only)
          r(sum)         sum of variable
          r(sd)          standard deviation
    There is no r(p98). When I try the command you show, in Stata 17.0, I am told there is a syntax error.

    Comment


    • #3
      There is no r(p98) after summarize, detail. So Stata interprets r(p98) as missing value (.). So your second command translates to -keep if inrange(co2eqavg3, r(1), .) | missing(co2eqavg3)-.

      Now, -inrange(x, a, .)- is equivalent to -x > a & !missing(x)-. So you are asking it to keep any obs where co2eqavg3 is missing or greater than the 1st percentile. Assuming that your 1st percentile is equal to the minimum value, that means that everything gets kept.

      Added: Crossed with #2.

      If you need to get the 98th percentile, you have to use a different command: -centile-
      Last edited by Clyde Schechter; 03 May 2022, 18:31.

      Comment

      Working...
      X