Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dividing the dataset into tertiles

    Dear Statalist members,

    My actual dataset is large so i am posting an example here.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte firm float(var1 da)
    1   .3440065  .47212315
    1   .8086346 .071356334
    1   .7810416  .58836746
    1  .59249973  -.5805435
    1    .763523   .4240699
    2  .58147943   .6231582
    2   .4374028  .29705063
    2  .22375245   .8810358
    2   .4670937  -.8361621
    2    .329619  .28815654
    2   .3748362  .17793214
    3   .7174032   .8677467
    3   .6616882  -.3164898
    3   .7876359   .3057091
    3   .8956375   .7337653
    3   .7788198   .5092093
    4   .9330595  .13902761
    4 .009004894    .342617
    4    .671393  -.8709255
    4   .4765641    .896404
    4   .9045753  .03744958
    5   .7953129    .436842
    5   .4977751   .4488074
    5  .01935726   .7797458
    5  .53233165   .1224825
    5  .08152536    .874486
    6   .3793468  -.5277116
    6  .56820965  .22691797
    6  .04368795   .3986916
    7    .913205   .4714641
    7   .9004316   .7407852
    7   .7772265  -.6351842
    7   .5769991  .13843548
    7  .16265187  .27961093
    7   .5061803  -.8985645
    end

    I want to divide the firms into tertiles in such a way that the Firms with high values of DA lie in upper tertile while Firms with low values of DA lie in the lower tertile. Moreover, the number of firms are not a multiple of 3.

    I used the code.

    xtile tda=da,nq(3)


    this command divides the dataset BUT the problem is that the firms are appearing in the upper as well as in lower tertiles. They should appear only once i.e; either in the upper tertile or in the lower tertile.

    My Question

    How can i solve this problem?
    After dividing the dataset i also want to regress 'da' on var1 in the corresponding tertile.

    Thanks in Advance

    Kind Regards
    Azhar Mughal



  • #2
    Indeed, your binning makes no reference to the firm identifier, so you got what you asked for and didn't get what you didn't ask for.

    But what is your precise rule for binning? For example, you could bin on the mean of a variable for each firm, and so on.

    Comment


    • #3
      My mistake , i couldn't explain properly.. I actually wanted to divide all the firm ids along with other variables in three groups ( tertiles or terciles ) based on one variable "da" . Each firm has multiple values of "da" in the dataset (high as well as low ), so each firm was appearing in the upper as well as in the lower group.

      After your guidance I calculated means for each firm "meanDA" then i generated a variable named "tertile" that assigned numbers 1 ,2 or 3 against each firm. The code is as follows:

      bysort firm: egen meanDA=mean(da)

      sort meanDA

      xtile tertile=meanDA,nq(3)

      Is this the correct way for dividing the dataset into tertiles and whether i am using the right term (tertile / tercile) for my question !! OR there is any other way of doing this!!! I really appreciate your time.

      Kind Regards




      Comment


      • #4
        You make the rules. In particular,

        1. Using the mean was just an example. I have no way of knowing what best matches your goals.

        2. Using xtile as you did produces, as best it can, equal numbers of observations in three groups. If you want equal numbers of firms as an ideal, you need different code.

        On the terminology, see

        https://www.stata-journal.com/articl...article=st0465

        https://stats.stackexchange.com/ques.../235334#235334

        https://www.stata-journal.com/articl...article=dm0095

        Last edited by Nick Cox; 03 Apr 2019, 09:56.

        Comment

        Working...
        X