Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dummy variable & decile ranking

    Dear Statalist-Community,
    I am looking for a way to generate a dummy variable which needs to show the value "1" for reflecting the membership in the top decile group of brokerage houses regarding the number of employed analysts in year t and else "0". Where the "top" decile (top 10%) is defined as having the most analysts employed. The data set is a panel with years, analysts, brokerage houses, firms and forecasts etc.

    I have already created an auxiliary variable "b_employees" which shows the number of analysts a broker employs in t. This number can be different from year to year. Now with that variable as basis, a new variable reflecting the above mentioned deciles should be created and a decile number being assigned to the observations.The top decile brokers having "1" assigned, while all other borkers get "0".

    I am using Stata 12.1

    Thanks in advance.
    Thomas


  • #2
    Dear Stata-Community,
    I have done some more research and trial'n error, but are now somehow stuck. Originally I wanted to have deciles reflecting exact percentages (e.g. 10%-decile, ...90%-decile), but regarding the inhomogenous number of the value-groups within my variable "b_employees" (= # of Broker employees of broker z in year t) from year to year, I decided to go with the xtile-command.

    The issue is the same: ranking those b_employees numbers in deciles. At the end the top decile (brokers with most employees) gets a "1" and the other nine deciles a "0".
    The following Code (as intermediate step) does actually what I want, but just for the year 1985, which I explicitely specify. The goal is to have the code executed for every year, so that every year gets its own decile ranking of variable "b_employees" expressed in one variable "decile". Unfortunaley xtile does not support bysort.

    Code:
    xtile decile = b_employees if year==1985, nq(10)
    I think for the Stata pros among you it's a breeze. ;-)
    For me as newbee it's haystack-needle-issue with having already crawled through manuals and Statalist-threads.

    Also, another issue with above decile-generation has arisen:
    Stata generates sometimes only 9 quantiles, which are obviously no "deciles" anymore , though I explicitely defined to have ten quantiles via "nq(10)". When I use "nq(11)" in those cases it does produce ten deciles. Unfortunately the behavior changes from year to year, means sometimes I get the desired ten quantiles and sometimes just nine quantiles with "nq(10)".
    How come ?
    Is there a way to force generation of exact ten quantiles ?

    In this regard, there are actually enough b_employee-values (namely 35) in e.g. year 1985 to create exact ten deciles. Of course the 35 values cannot evenly distributed among the deciles, but that's OK for that purpose.

    In the next step the max decile (numbered usually with "10") shall get the value "1" assigned, while the other deciles get a "0".

    Thank you very much in advance.

    Thomas

    Comment


    • #3
      I don't find your write-up very clear; however, if the first problem is just how to get several of the decile variables, the following just builds on your code:
      Code:
      forval i=1985/1995 {
      xtile decile`i' = b_employees if year==`i', nq(10)
      }
      note that I don't see any place where you say what years you want this for so I have made that part up

      my guess re: the number of quantiles is that you have ties in your data - do you?

      Comment


      • #4
        Thanks, Rich, for your feedback and sorry to be not concise. Your code did nicely. The only thing to adjust is the fact that separate variables "decile1985" up to "decile2013" have been created. Each of those decileXXXX variables just shows deciles for their year (e.g. 1985) with all other years (e.g. 1986-2013) showing missings. What those variables show separately, I need to have being aggregated in one single variable called "decile". So no missings will occur and all years are filled with their deciles. To clarify please look at the following, which reflects how the data is arranged. What I need is the fourth column with variable “decile”:

        Code:
        id   year b_employee decile
        1    1985      10      1
        2    1985      10      1
        3    1985      15      1
        4    1985      30      2
        5    1985      35      2
        6    1985      40      2
        ....
        100  1985      150    10
        101  1985      150    10
        102  1985      155    10
        103  1985      160    10
         
        104  1986      10      1
        105  1986      12      1
        106  1986      15      1
        107  1986      25      2
        108  1986      29      2
        109  1986      33      2
        ....
        250  1986      120    10
        251  1986      125    10
        252  1986      133    10
        ....
        
        xxxx 2013      ...    ...

        Regarding the potential ties in the data and number of quantiles generated: yes, as you can see below the deciles of the single years (here 1985 & 1986) are unevenly filled with observations. For 1985 your code produces (as did mine xtile-line) nine quantiles and in 1986 ten quantiles.

        HTML Code:
           . tab decile1986
                     10 |
              quantiles |
            of Bempl_N  |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      1 |      1,267       10.96       10.96
                      2 |      1,401       12.12       23.08
                      3 |        950        8.22       31.29
                      4 |      1,041        9.00       40.30
                      5 |      1,385       11.98       52.28
                      6 |      1,005        8.69       60.97
                      7 |      1,622       14.03       75.00
                      8 |      1,089        9.42       84.42
                      9 |      1,801       15.58      100.00
            ------------+-----------------------------------
                  Total |     11,561      100.00

        HTML Code:
           . tab decile1987
                     10 |
              quantiles |
            of Bempl_N  |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      1 |      1,571       11.60       11.60
                      2 |      1,713       12.65       24.25
                      3 |      1,060        7.83       32.08
                      4 |      1,186        8.76       40.84
                      5 |      1,343        9.92       50.75
                      6 |      1,516       11.19       61.95
                      7 |      1,568       11.58       73.53
                      8 |      1,267        9.36       82.88
                      9 |      1,390       10.26       93.15
                     10 |        928        6.85      100.00
            ------------+-----------------------------------
                  Total |     13,542      100.00

        Thanks,

        Thomas

        Comment


        • #5
          sorry, the last two setences should read of course as follows:
          "yes, as you can see below the deciles of the single years (here 1986 & 1987) are unevenly filled with observations. For 1986 your code produces (as did mine xtile-line) nine quantiles and in 1987 ten quantiles."

          Comment


          • #6
            Dear Stata-Community,
            I found the solution in the “egenmore” command, which allows usage with bysort and use of by().
            The latter helped me generating one variable “decile” which included deciles of my aux-variable “b_employees” grouped by years. It was very quick in calculation and generation.

            Code:
            egen decile = xtile(b_employee), by(year) nq(10)
            Afterwards one can show each year's (e.g. 1985) decile allocations separately with (similar output as shown in my recent postings):

            Code:
            tab decile if year==1985
            or the decile allocations over the whole sample with (total observations per decile 1 to 10):

            Code:
            tab decile

            Problem solved !










            Comment

            Working...
            X