Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the percentile that corresponds to a certain value

    Hello,
    I'm facing a challenge with a variable transformation. I've transformed a variable into percentiles, which has made all the numbers positive. Consequently, I've lost track of whether each value was an increase or a decrease. What I'm trying to achieve is to create a vertical line in my graph between positive and negative values of the variable of interest (prior to the transformation into percentiles). To accomplish this, I need to find the percentile that corresponds to 0 in the distribution of my variable.

    Code:
    sort `xvar'_`change1'
        qui egen `xvar'_`change1'_cdf = xtile(`xvar'_`change1'), nq(100)
        qui replace `xvar'_`change1'_cdf = round(`xvar'_`change1'_cdf)


    In this code, I'm generating percentiles for each observation. What I'm looking to do is create a new local variable, let's call it zero_xvar'_change1', which should be equal to the percentile value of xvar'_change1'` when it equals 0.

    Additionally, I'm curious if there's an alternative method to directly set the demarcation line in the graph using percentiles on the x-axis.

    Thank you for your assistance!

  • #2
    There is some challenging use of terminology here. By percentiles you seem to mean variously percentile bins or their labels, sometimes called percentile ranks. By local variable you mean local macro.

    That is, a percentile such as the 75% percentile is a value on a variable, or calculated from such values. 0.75 or 75% is not the percentile itself but a cumulative probability.

    More crucially, there is in general -- even when a variable could be negative, zero, or positive -- no guarantee that it is ever zero, or conversely that multiple zeros may not occur.

    I would get three counts, namely

    Code:
    count if foo < 0
    local nneg = r(N)
    count if too == 0
    local nzero = r(N)
    count if foo > 0 & foo < .
    local npos = r(N)
    local wanted = 100 * (`nneg' + 1/2 * `nzero') / (`nneg' + `nzero' + `npos')
    So the wanted percent is the percent of negative numbers PLUS 1/2 the percent of zeros.

    I don't think there is an easy way to exploit previous sorting, as you still should check for multiple zeros.

    I can't see your data but enthusiasts for quantile binning are often frustrated by the results. Problems start with the fact that tied values must be assigned to the same bin, so the pattern of frequencies may be decidedly lumpy. Identifying 100 quantile bins is only going to work well if there are no or very few ties and very much more than 100 values. That may be true of your data.

    A direct method to get at so-called plotting positions would be

    Code:
    egen rank = rank(foo)
    count if foo < .
    gen ppos = (rank - 0.5) / r(N)
    followed with multiplication by 100 as desired.

    That maps distinct values to distinct plotting positions with only one arbitrary choice, the choice of 0.5. See https://www.stata.com/support/faqs/s...ting-positions for more detail. That rule treats the tails of the distribution symmetrically.

    I am not sure that this answers your question. I don't fully understand your data or the graph you're trying to draw. Nor do I understand why you are rounding values as the output of the xtile() function of egen -- which comes from egenmore (SSC), as you're asked to explain -- is a set of integers.
    Last edited by Nick Cox; 24 Jan 2024, 10:09.

    Comment

    Working...
    X