Dividing the dataset into tertiles

Azhar Mughal

Join Date: Feb 2019
Posts: 7

Dividing the dataset into tertiles

02 Apr 2019, 06:02

Dear Statalist members,

My actual dataset is large so i am posting an example here.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte firm float(var1 da)
1   .3440065  .47212315
1   .8086346 .071356334
1   .7810416  .58836746
1  .59249973  -.5805435
1    .763523   .4240699
2  .58147943   .6231582
2   .4374028  .29705063
2  .22375245   .8810358
2   .4670937  -.8361621
2    .329619  .28815654
2   .3748362  .17793214
3   .7174032   .8677467
3   .6616882  -.3164898
3   .7876359   .3057091
3   .8956375   .7337653
3   .7788198   .5092093
4   .9330595  .13902761
4 .009004894    .342617
4    .671393  -.8709255
4   .4765641    .896404
4   .9045753  .03744958
5   .7953129    .436842
5   .4977751   .4488074
5  .01935726   .7797458
5  .53233165   .1224825
5  .08152536    .874486
6   .3793468  -.5277116
6  .56820965  .22691797
6  .04368795   .3986916
7    .913205   .4714641
7   .9004316   .7407852
7   .7772265  -.6351842
7   .5769991  .13843548
7  .16265187  .27961093
7   .5061803  -.8985645
end

I want to divide the firms into tertiles in such a way that the Firms with high values of DA lie in upper tertile while Firms with low values of DA lie in the lower tertile. Moreover, the number of firms are not a multiple of 3.

I used the code.

xtile tda=da,nq(3)

this command divides the dataset BUT the problem is that the firms are appearing in the upper as well as in lower tertiles. They should appear only once i.e; either in the upper tertile or in the lower tertile.

My Question

How can i solve this problem?
After dividing the dataset i also want to regress 'da' on var1 in the corresponding tertile.

Thanks in Advance

Kind Regards
Azhar Mughal

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

02 Apr 2019, 07:24

Indeed, your binning makes no reference to the firm identifier, so you got what you asked for and didn't get what you didn't ask for.

But what is your precise rule for binning? For example, you could bin on the mean of a variable for each firm, and so on.
Comment
Azhar Mughal

Join Date: Feb 2019

Posts: 7
#3

03 Apr 2019, 07:12

My mistake , i couldn't explain properly.. I actually wanted to divide all the firm ids along with other variables in three groups ( tertiles or terciles ) based on one variable "da" . Each firm has multiple values of "da" in the dataset (high as well as low ), so each firm was appearing in the upper as well as in the lower group.

After your guidance I calculated means for each firm "meanDA" then i generated a variable named "tertile" that assigned numbers 1 ,2 or 3 against each firm. The code is as follows:

bysort firm: egen meanDA=mean(da)

sort meanDA

xtile tertile=meanDA,nq(3)

Is this the correct way for dividing the dataset into tertiles and whether i am using the right term (tertile / tercile) for my question !! OR there is any other way of doing this!!! I really appreciate your time.

Kind Regards
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#4

03 Apr 2019, 09:53

You make the rules. In particular,

1. Using the mean was just an example. I have no way of knowing what best matches your goals.

2. Using xtile as you did produces, as best it can, equal numbers of observations in three groups. If you want equal numbers of firms as an ideal, you need different code.

On the terminology, see

https://www.stata-journal.com/articl...article=st0465

https://stats.stackexchange.com/ques.../235334#235334

https://www.stata-journal.com/articl...article=dm0095

Last edited by Nick Cox; 03 Apr 2019, 09:56.
Comment

Announcement

Dividing the dataset into tertiles

Comment

Comment

Comment