Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the 90th and 10th percentiles

    I'm trying to find the 10th and 90th percentile for a set of polling booth observations in multiple towns. An example is as follows:

    Town_id...............Voters
    10......................100
    10......................90
    10......................120

    11......................95
    11......................34
    11......................98

    As you can see, each polling booth has a different number of voters in the same town. So I'm meant to find out the 10th and 90th percentiles of towns 10, 11, 12, etc.

    Any ideas?

  • #2
    Plenty of ideas, but make sure you describe the task informatively.

    Here is how I read it:
    Code:
    sysuse nlsw88, clear
    
    levelsof occupation, local(levs)
    matrix C=J(`:word count `levs'',2,.)
    
    local i=1
    foreach lev in `levs' {
      quietly centile wage if occupation==`lev', c(10 90)
      matrix C[`i',1]=r(c_1)
      matrix C[`i',2]=r(c_2)
      local `i++'
    }
    
    matrix list C, f(%6.2f)
    In your case specify townid instead of occupation and voter turnout instead of wage.

    Best, Sergiy Radyakin

    Comment


    • #3
      Originally posted by Sergiy Radyakin View Post
      Plenty of ideas, but make sure you describe the task informatively.

      Here is how I read it:
      Code:
      sysuse nlsw88, clear
      
      levelsof occupation, local(levs)
      matrix C=J(`:word count `levs'',2,.)
      
      local i=1
      foreach lev in `levs' {
      quietly centile wage if occupation==`lev', c(10 90)
      matrix C[`i',1]=r(c_1)
      matrix C[`i',2]=r(c_2)
      local `i++'
      }
      
      matrix list C, f(%6.2f)
      In your case specify townid instead of occupation and voter turnout instead of wage.

      Best, Sergiy Radyakin

      Thanks. So just to make sure I got this right, the C1 and C2 in the result matrix constitute the 10th percentile and the 90th percentile, respectively?

      What if I wanted to find out the mean of the Voter Turnout by town_id? Would I then use a simple egen mean code by town, or would I have to similarly create a matrix?

      Comment


      • #4
        re: c_1,c_2---> see: help centile
        re: mean --> it depends. the wording of "find out" is understood differently by different people. for someone it is enough to see a number on the screen, for others further program processing is required. if you are in the first group, remove the matrix all together along with the quietly:
        Code:
        sysuse nlsw88, clear
        levelsof occupation, local(levs)
        foreach lev in `levs' {
          centile wage if occupation==`lev', c(10 90)
        }
        or better yet
        Code:
        . table occupation, c(p10 wage mean wage p90 wage) format(%6.2f)
        
        -----------------------------------------------------------
                    occupation |  p10(wage)  mean(wage)   p90(wage)
        -----------------------+-----------------------------------
        Professional/technical |       5.03       10.72       16.18
                Managers/admin |       4.15       10.90       19.36
                         Sales |       3.62        7.15       10.84
            Clerical/unskilled |       3.01        8.52       17.03
                     Craftsmen |       3.53        7.15       12.23
                    Operatives |       2.90        5.65        8.87
                     Transport |       1.72        3.20        5.64
                      Laborers |       2.70        4.91        7.52
                       Farmers |       8.05        8.05        8.05
                 Farm laborers |       1.81        3.08        4.18
                       Service |       3.50        5.99        8.95
             Household workers |       6.17        6.39        6.61
                         Other |       4.11        8.84       13.76
        -----------------------------------------------------------


        Sergiy Radyakin

        Comment


        • #5
          Note that egen also offers percentile functions.

          Comment


          • #6
            Originally posted by Sergiy Radyakin View Post
            re: c_1,c_2---> see: help centile
            re: mean --> it depends. the wording of "find out" is understood differently by different people. for someone it is enough to see a number on the screen, for others further program processing is required. if you are in the first group, remove the matrix all together along with the quietly:
            Code:
            sysuse nlsw88, clear
            levelsof occupation, local(levs)
            foreach lev in `levs' {
            centile wage if occupation==`lev', c(10 90)
            }
            or better yet
            Code:
            . table occupation, c(p10 wage mean wage p90 wage) format(%6.2f)
            
            -----------------------------------------------------------
            occupation | p10(wage) mean(wage) p90(wage)
            -----------------------+-----------------------------------
            Professional/technical | 5.03 10.72 16.18
            Managers/admin | 4.15 10.90 19.36
            Sales | 3.62 7.15 10.84
            Clerical/unskilled | 3.01 8.52 17.03
            Craftsmen | 3.53 7.15 12.23
            Operatives | 2.90 5.65 8.87
            Transport | 1.72 3.20 5.64
            Laborers | 2.70 4.91 7.52
            Farmers | 8.05 8.05 8.05
            Farm laborers | 1.81 3.08 4.18
            Service | 3.50 5.99 8.95
            Household workers | 6.17 6.39 6.61
            Other | 4.11 8.84 13.76
            -----------------------------------------------------------


            Sergiy Radyakin
            That's a fantastic solution. As for "find out", I mean to say that I would need the results to be part of the data set, as separate variables so that I can plot graphs with the results. I'm assuming that would require a different code, unless I can somehow transfer the above results into the data set?

            Comment


            • #7
              You have already answered your own question, at least in part.

              You mentioned egen's mean()function which will put results in new variables, and I added a pointer to its percentile functions.

              Beyond that, graph bar, graph hbar, graph dot will work out means, p10, p90, etc., etc. without your needing to have the results in separate variables.

              Comment


              • #8
                Originally posted by Nick Cox View Post
                You have already answered your own question, at least in part.

                You mentioned egen's mean()function which will put results in new variables, and I added a pointer to its percentile functions.

                Beyond that, graph bar, graph hbar, graph dot will work out means, p10, p90, etc., etc. without your needing to have the results in separate variables.
                Thanks. I've been combing through the egen help but it appears that I'm unable to actually output the mean and percentile results by town_id when I'm using egen? That seems a bit odd.

                Comment


                • #9
                  What code did you try? What happened? What was right or wrong?

                  It is best to try code and find out, not to wonder what's possible, whether that's odd, or whatever.

                  If you create new variables, they won't be displayed automatically when you create them. But you can list them, graph them, tabulate them.

                  Code:
                   
                  sysuse auto
                  egen mean = mean(mpg), by(rep78)
                  sort rep78
                  edit rep78 mpg mean
                  tabdisp rep78, c(mean)

                  Comment


                  • #10
                    Originally posted by Nick Cox View Post
                    What code did you try? What happened? What was right or wrong?

                    It is best to try code and find out, not to wonder what's possible, whether that's odd, or whatever.

                    If you create new variables, they won't be displayed automatically when you create them. But you can list them, graph them, tabulate them.

                    Code:
                    sysuse auto
                    egen mean = mean(mpg), by(rep78)
                    sort rep78
                    edit rep78 mpg mean
                    tabdisp rep78, c(mean)
                    For 10th percentile, I've been trying to use the following (or variants of)

                    Code:
                    by town_id: egen p10 = pctile(voter_turnout), p(10)
                    But I receive the error 'not sorted'.

                    Your mean code worked like a charm.

                    Comment


                    • #11
                      If you do it your way, then you must

                      Code:
                      sort town_id
                      first, or at the same time

                      Code:
                        
                       bysort town_id: egen p10 = pctile(voter_turnout), p(10)
                      As it says, very early in the help for by

                      by without the sort option requires that the data be sorted by varlist

                      Comment


                      • #12
                        Originally posted by Nick Cox View Post
                        If you do it your way, then you must

                        Code:
                        sort town_id
                        first, or at the same time

                        Code:
                        bysort town_id: egen p10 = pctile(voter_turnout), p(10)
                        As it says, very early in the help for by

                        The problem I'm facing with this code is that, unlike your mean code, the 10 percentile egen code calculates one value for all towns, instead of calculating separate 10 percentile codes per town.

                        So where your mean code result comes out to something like this (ignore the voter turn out observations):

                        Code:
                        town_id..................Voter Turnout...............Mean
                        171.............................X............................10
                        171.............................Y...........................10
                        171.............................Z............................10
                        182.............................A...........................20
                        182.............................B...........................20
                        182.............................C...........................20
                        The 10 percentile code result comes out like this:

                        Code:
                        town_id..................Voter Turnout...............10-Percentile
                        171.............................X............................15
                        171.............................Y...........................15
                        171.............................Z............................15
                        182.............................A...........................15
                        182.............................B...........................15
                        182.............................C...........................15

                        Comment


                        • #13
                          You haven't shown us anything obviously wrong. It's quite possible for different towns to have similar, even identical, results.

                          Comment

                          Working...
                          X