Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting data into quintiles

    I had a question about xtile in Stata.

    I have an array of 254 numbers( from 0.282 to 2.403 but they have random increments). I need to split this into quintiles, that is split at approximately 20% cutoffs.
    Right now my code looks like
    Code:
     xtile quintileVPeps=VPeps, n(5)
    where VPeps is the number array. I want to get 5 equal tiles but it seems that Stata gives me funky quintiles.

    In Q5 (top quintile), there are 28 numbers
    in Q4 68 numbers
    in Q3 89 numbers
    in Q2 53 numbers
    in Q1 (bottom quintile) only 7..

    Is there a way I can split the numbers so that each quintile has 20% of the data?

    Thanks,

  • #2
    Code:
     
    scatter quantileVPeps VPeps
    should explain all. The answer to the last question is going to be: Not helpfully. In fact, if you really need quintile-based bins, you will be discarding information in the data.

    Comment


    • #3
      Thanks Nick! So there is no way in Stata to divide into quintiles without losing data? If I set up a count variable such as
      Code:
      gen count=_n
      so that it gives me increments in 1 units, xtile won't give me 20% tiles?

      Then what would be the best way to divide into 20% tiles?

      Thank you as always :D

      Comment


      • #4
        So _n command didn't work either
        Seems that when I do
        Code:
        pctile pct=VPeps, nq(5)
        the new variable pct contain all dots (.).
        Any reasons for that?

        So I can't split my data into equal quintiles...??

        Comment


        • #5
          This problem is often explained on this list. See e.g. the last post in http://www.statalist.org/forums/foru...dent-variables and more generally search the forum for threads on xtile

          There is no way in any program to divide into quantile-based bins without losing data unless

          1. The number of values is a multiple of the number of bins, where by the usual mathematical convention "multiple" means "exact multiple".

          2. Values are constant within each bin. Not only is this rarely the case, ties often frustrate the aim of producing bins that are equally numerous.

          I'll guess with probability 0.9 that you are an economist interested in the performance of firms or equivalently their shares. If so, the urge to do this is evidently a tribal compulsion.

          Comment


          • #6
            Thanks Nick. Yes the firms are not exact multiples of 5..but I think I can do with losing a little bit of data because right now my quintiles are not "quintiles" in that they don't have approximately similar number of observations in them.
            So what would be the best solution? I'd be fine with losing a little bit of data.

            Comment


            • #7
              Best for what purpose? You haven't explained why you think you need bins any way.

              Comment


              • #8
                I'm trying to construct a portfolio of firms based on their V/P ratios. V represents fundamental firm value that I've created. I'm taking a long (short) position in the top (bottom) quintile of firms according to their V/P ratios.
                Right now, since the quintiles do not have equal number of firms, the results are biased.
                So in order for the portfolio returns to be unbiased, I need the quintiles to have approximately a similar number of firms within them.

                Comment


                • #9
                  Show us the result of

                  Code:
                   
                  quantile vPeps
                  and people may have suggestions on what they would do.

                  Comment


                  • #10
                    [IMG]file:///C:/Users/jy77/Desktop/Graph.png[/IMG]Looks like this...

                    Comment


                    • #11
                      I tried using
                      Code:
                      egen quintile=cut(VPeps), group(5)
                      and I think this is the closest to what I need.
                      I just need to use the "if" option in there with that code but apparently Stata does not allow if to be used with the above code..
                      is there a way I can use the if option?
                      I need to rank by quintile if the month variable is December.

                      So I tried
                      Code:
                      egen quintileVP=cut(VP), group(5) if month=="12"
                      but it didn't work.

                      Any suggestions?

                      Comment


                      • #12
                        Code:
                        egen quintileVP=cut(VP) if month=="12", group(5)
                        this ended up working..but still quintile sizes are different.

                        Q5: 29 firms
                        Q4: 70
                        Q3: 93
                        Q2: 54
                        Q1: 8

                        which is similar to what xtile did...but I'd really want my groups to be divided equally. Since I have a total of 254 firms in that sample, I'd want approximately 51 firms in each quintile.
                        This is what I'd want (potentially)

                        Q5: 50
                        Q4: 51
                        Q3: 51
                        Q2: 51
                        Q1: 51

                        Is there absolutely no way I can divide my groups into equal sizes?
                        Even if it's with losing data, I would want to try it out so please let me know!

                        Comment


                        • #13
                          Its really strange but I think I've figured it out. When I do
                          Code:
                          xtile quintile1962=VPeps if yearmonth=="196212"
                          xtile perfectly(close to perfect) splits my 254 data points into approximately equal sized quintiles.
                          So when I gave xtile a specific year&month to split it, it did the trick for me.
                          I ended up looping it to figure out the quintiles for all my firm-year combinations.

                          Thanks everyone for your input!

                          Comment


                          • #14
                            Final code looked like this
                            Code:
                            qui forval yr=1962/2013{
                                xtile quintile`yr'=VPeps if yearmonth=="`yr'12", nq(5) }

                            Comment


                            • #15
                              I can't add much to my earlier comments. In #5 I explained why the numbers vary from bin to bin. As the response in question is highly skewed, the quantile plot does not make it especially clear but something like

                              Code:
                               
                              sysuse auto
                              xtile qmpg = mpg, nq(5)
                              tab mpg qmpg
                              applied to your data may help you see the main difficulty, which is that equal values must end up in the same bin. egen, cut() just uses a different rule at boundaries, but whether clumps of equal values are pushed up or down it can't solve the difficulty.absolutely if xtile cannot find an exact solution.

                              More discussion within http://www.stata-journal.com/article...article=pr0054


                              Comment

                              Working...
                              X