Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quartiles (or centiles) by group

    Dear all,

    I am trying to do something conceptually fairly simple. I would like to create a group variable which tells me in which quartile an observation falls into according to the value of a variable. I have tried to do that in this way:

    by group year: xtile quant=x, nq(4)

    by it didn't work. I have also tried a bunch of similar codes but none seemed to be effective. The problem is that none of the functions computing quantiles accepts the by option.... Can someone suggest something better?

    Many thanks!
    Riccardo
    Last edited by Riccardo Valboni; 12 Jun 2014, 08:43.

  • #2
    Hi Riccardo,
    There are two options, you can either use a loop:
    * lets assume your year goes from 1990 to 2000
    gen quant=.
    forvalues i=1990/2000 {
    capture drop xq
    xtile xq=x if year==`i', nq(4)
    replace quant=x if year==`i'
    }

    The other is to install egenmore (ssc install egenmore), and use the following:
    egenmore quant=xtile(x), n(4) by(year)

    I think both are equally efficient, so since you are doing this by year and ID, you should expect it to take a relatively long time to run the commands.
    Fernando

    Comment


    • #3
      Fernando meant

      Code:
       
      egen quant=xtile(x), n(4) by(year)
      egenmore is the package name only; the package includes a ragbag of more functions for egen.

      Comment


      • #4
        Thanks Nick.

        Comment


        • #5
          Great, it works even with more than one level of by.

          Comment


          • #6
            Dear all,
            My aim is to generate quintiles of a continuous variable (alcohol use/g; variable name: alc) by sex (variable: sex). The range in alc is [0, 1700].

            Are there any other ways of doing this very simple task other than suggested by Nick and Fernando above? I am finding it pretty hard to follow Fernando´s suggestion applied to my variables (since I am pretty new to Stata), and I am using Stata at our institute where downloading packages online is either very hard or not allowed. I am using StataSE 16.0. Any advice would be appreciated.
            Thank you!

            Comment


            • #7
              It’s not clear whether you want quintile values or the bins they delimit. Either way the trick is to just to repeat say xtile or pctile or the needed command for each distinct value of sex.

              Comment


              • #8
                Here's a dopey example with quintile bins done separately by distinct values of foreign for mpg in the auto dataset.


                Code:
                 sysuse auto, clear
                (1978 Automobile Data)
                
                . xtile qmpg1=mpg if foreign==1, nq(5)
                
                . xtile qmpg0=mpg if foreign==0, nq(5)
                
                . gen qmpg = max(qmpg0, qmpg1)
                
                .
                . tab qmpg foreign
                
                           |       Car type
                      qmpg |  Domestic    Foreign |     Total
                -----------+----------------------+----------
                         1 |        13          5 |        18
                         2 |         9          5 |        14
                         3 |        11          5 |        16
                         4 |        11          3 |        14
                         5 |         8          4 |        12
                -----------+----------------------+----------
                     Total |        52         22 |        74
                
                .
                This is the same method as that of @Fernardo Rios in #2, except that there's a typo in his main code segment (and manifestly, he's using 4 bins, not 5).

                Code:
                gen quant=.
                forvalues i=1990/2000 {
                    capture drop xq
                    xtile xq=x if year==`i', nq(4)
                    replace quant=xq if year==`i'
                }
                Last edited by Nick Cox; 08 Nov 2019, 12:36.

                Comment


                • #9
                  Thank you very much Nick! This worked well for men (sex==1):
                  Code:
                  xtile qalc_1=alc if sex==1, nq(5)
                  xtile qalc_2=alc if sex==2, nq(5)
                  gen qalc = max(qalc_1, qalc_2)
                  However, for women (sex==2) there was zero observations in the second quintile whereas 1,908 (48 % of all for women) observations in the first quintile (see below). I suspect this might be due to the skewness of the original alcohol variable.
                  Code:
                  tab qalc sex
                  alc sex Total
                  1 2
                  1 797 1,908 2,705
                  2 579 0 579
                  3 695 566 1,261
                  4 673 767 1,440
                  5 686 759 1,445
                  Total 3,430 4,000 7,430

                  Comment


                  • #10
                    #9 Those quintile bins look fairly useless in practice. If you need to categorise at all, I would use the values of your variable directly.

                    More discussion at https://www.stata-journal.com/articl...article=dm0095

                    https://www.stata-journal.com/articl...article=pr0054

                    Comment


                    • #11
                      Nick, when I run
                      egen quant=xtile(x), n(4) by(year)
                      it says tile not found. I am running Stata 14.2 SE! I downloaded egenmore again but its still says unknown function xtile(). I even tried running

                      xtile interd_quartile = interd, n(4) by (career_year)

                      but it lead to the error: option by not allowed
                      Last edited by Shivam agrawal; 04 Dec 2019, 19:54.

                      Comment


                      • #12
                        #11 Installing egenmore from SSC will have precisely no consequences for xtile. It will not mean that xtile now supports a by() option. So your last problem report is not at all surprising.

                        But you say that you downloaded egenmore "again". From your post I can only guess that you put the files in the wrong place.

                        A correct installation will mean that Stata can see a file _gxtile.ado so that asking which will show you something like this:

                        Code:
                        . which _gxtile.ado
                        c:\ado\plus\_\_gxtile.ado
                        *! _gxtile version 1.2 UK 08 Mai 2006
                        It doesn't matter if your location is different: for example, you may not be using Windows, or your set-up may vary otherwise. What does matter is that Stata can find that program file to use it.

                        What happens sometimes is that people install files with their browser and put them in the wrong place. Or something dopey happens, such as the files acquire an irrelevant extension .html.

                        Note that https://ideas.repec.org/c/boc/bocode/s386401.html explicitly advises

                        Note: This module may be installed from within Stata by typing "ssc install egenmore". Windows users should not attempt to download these files with a web browser.

                        Here is a self-contained example you can run which should work:

                        Code:
                         
                        . sysuse auto, clear
                        (1978 Automobile Data)
                        
                        . ssc install egenmore, replace 
                        
                        . egen wanted = xtile(mpg), n(2) by(foreign)
                        
                        . tab wanted foreign
                        
                                   |       Car type
                            wanted |  Domestic    Foreign |     Total
                        -----------+----------------------+----------
                                 1 |        30         11 |        41 
                                 2 |        22         11 |        33 
                        -----------+----------------------+----------
                             Total |        52         22 |        74
                        Here is the code all in one for convenience:

                        Code:
                        sysuse auto, clear
                        ssc install egenmore, replace 
                        egen wanted = xtile(mpg), n(2) by(foreign)
                        tab wanted foreign

                        Comment


                        • #13
                          Dear all,
                          I have a fairly simple task: I need to generate a new variable based on tertiles of the variable in interest, taking into account the classes of two different variables. In my case, I need to generate education tertiles (based on years of schooling) taking into account both sex and the year of birth. I have done this taking into account only year of birth as follows:
                          Code:
                          egen education_tertiles =xtile(education_years), n(3) by(year_of_birth)
                          My question is, how to add gender on the by-option? Many thanks for any tips!

                          Best regards,
                          Otto

                          Comment


                          • #14
                            Code:
                            by(year_of_birth gender) 

                            Comment


                            • #15
                              Dear Nick,
                              Thanks for a prompt answer! My intuition does not always match to that of Stata's (I was trying with commas between the variables, still learning...).

                              A follow-up: when generating the new variable, I should assing ties of the values to means. In SPSS, this is done by default (see here): "First, the observations are ordered and given unique, sequential ranks. Then, tied observations have their assigned ranks averaged together." Could this be done in Stata?
                              Best,
                              Otto

                              Comment

                              Working...
                              X