Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • views Creating box plots of the gap between two groups by deciles

    Hello and thanks for taking your time to read an answer my question,

    I am working with Stata and I have math grades for two different groups: A and B. I want to see the gap that exists between both groups in each decile. In addition I want to do a box plot of this gap for each decile (I want to have 10 box plots, one for each decile which shows the gap between group grades).

    What I first did was to compute the deciles using xtile for both groups:

    xtile decileA= mat if group==1, nq(10)

    xtile decileB= mat if group==0, nq(10)


    However, my observations of group A and B do not have the same number of observations nor distribution. I thought of computing quantiles for each decile and group and subtracting them to get the difference in each decile at each quartile to create the boxplot. However I do not know how to proceed afterwards to create the graph and given that I hace a different number of observations in each group decile I do not know if it is correct to proceed this way.

    Now, if I try to use the pctile option and compute the difference at each decile I loose all the variance in the data inside each decile. I only get median differences and not all the quantiles I want.

    Ex:

    pctile decileA= mat if group==1, nq(10)

    pctile decileB= mat if group==0, nq(10)

    gen qdiff= decileA- decileB if _n<10

    gen qtau=_n/10 if _n<10

    graph box qdiff, over(qtau)

    I want to know if there is a way to do the graph I am intending to and if there is I would really appreciate your help.

    Thanks, Karla

    Cross-posted at https://stackoverflow.com/questions/...ups-by-deciles
    Last edited by Karla Alfaro; 25 Feb 2019, 18:56.

  • #2
    Cross-posted at https://stackoverflow.com/questions/...ups-by-deciles Please note our cross-posting policy, which is explicit in the FAQ Advice. You are asked to tell us about it.
    Last edited by Nick Cox; 25 Feb 2019, 18:39.

    Comment


    • #3
      Hello Nick,

      I am sorry about it, I am new to the site and did not know this, but thanks for the advice.
      I am editing my posts now.

      Karla

      Comment


      • #4
        This procedure seems rather elaborate and based on a series of arbitrary decisions.

        As you mention, you have notable variation within each decile (meaning, decile bin or interval), so why bin at all? (Further, quantile binning is often more awkward than advertised, given any ties of values, extremely likely with grades in education.)

        You have mathematics grades for two groups. So, compare quantile plots for each group.

        The first graph below uses qplot from the Stata Journal.

        In these references, the more important now are those from 1999, 2005, 2012 and 2016 (to download the software files).

        Code:
        SJ-16-3 gr42_7  . . . . . . . . . . . . . . . . . .  Software update for qplot
                (help qplot if installed) . . . . . . . . . . . . . . . . .  N. J. Cox
                Q3/16   SJ 16(3):813--814
                option midpoint has been added
        
        SJ-12-3 gr0053  . Speaking Stata: Axis practice, or what goes where on a graph
                (help multqplot if installed) . . . . . . . . . . . . . . .  N. J. Cox
                Q3/12   SJ 12(3):549--561
                discusses variations on what goes on each axis of a two-way
                plot; provides multiple quantile plots
        
        SJ-12-1 gr42_6  . . . . . . . . . . . . . . . . . .  Software update for qplot
                (help qplot if installed) . . . . . . . . . . . . . . . . .  N. J. Cox
                Q1/12   SJ 12(1):167
                provides better handling of y-axis titles
        
        SJ-10-4 gr42_5  . . . . . . . . . . . . . . . . . .  Software update for qplot
                (help qplot if installed) . . . . . . . . . . . . . . . . .  N. J. Cox
                Q4/10   SJ 10(4):691
                better handling of axis titling and missing values
        
        SJ-6-4  gr42_4  . . . . . . . . . . . . . . . . . .  Software update for qplot
                (help qplot if installed) . . . . . . . . . . . . . . . . .  N. J. Cox
                Q4/06   SJ 6(4):597
                better handling of x-axis titling, a new option allowing
                the user to specify an alternative plotting position or
                rank variable
        
        SJ-5-3  gr0018  . . . . . . . . . .  Speaking Stata: The protean quantile plot
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                Q3/05   SJ 5(3):442--460           (see gr41_3 and gr42_3 for commands)
                discusses quantile and distribution plots as used in
                the analysis of species abundance data in ecology
        
        SJ-5-3  gr42_3  . . . . . . . . . . . . . . . . . .  Software update for qplot
                (help qplot if installed) . . . . . . . . . . . . . . . . .  N. J. Cox
                Q3/05   SJ 5(3):471
                simplified syntax; both by() and over() are now allowed
        
        SJ-4-1  gr0003  . . . . . . . . . . . . Speaking Stata: Graphing distributions
                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                Q1/04   SJ 4(1):66--88                                   (no commands)
                a review of official and user-written commands for
                graphing univariate distributions; includes tricks
                beyond what is obviously and readily available
        
        SJ-4-1  gr42_2  Software update for Quantile plots, generalized; renamed qplot
                (help qplot if installed) . . . . . . . . . . . . . . . . .  N. J. Cox
                Q1/04   SJ 4(1):97
                software update for quantil2, now renamed qplot
        
        STB-61  gr42.1  . . . . . . . Quantile plots, generalized: update to Stata 7.0
                (help quantil2 if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                5/01    pp.10--11; STB Reprints Vol 10, pp.55--56
                updated for use with Stata 7
        
        STB-51  gr42  . . . . . . . . . . . . . . . . . .  Quantile plots, generalized
                (help quantil2 if installed)  . . . . . . . . . . . . . . .  N. J. Cox
                9/99    pp.16--18; STB Reprints Vol 9, pp.113--116
                generalizes the quantile command by allowing more than one
                variable (or a by() option), more graphical choices, general
                plotting positions, and an option to reverse the order
        The mpg variable in the auto dataset also comes in two groups of unequal size. The qplot adjusts for that by using plotting positions of the form (rank - constant) / (sample size - 2 constant + 1). Roughly speaking, this is a graph of empirical cumulative distribution functions transposed, although I would argue that plotting the response or outcome (grades in your case) on the vertical axis makes inspection of detail easier and more effective.

        Code:
        . sysuse auto, clear
        (1978 Automobile Data)
        
        . qplot mpg, over(foreign) aspect(1) scheme(s1color)
        Click image for larger version

Name:	karla1.png
Views:	1
Size:	28.4 KB
ID:	1485476

        Another possibly useful command is cquantile from SSC which gives you a route to quantile-quantile plots for two groups.

        Code:
        cquantile mpg, by(foreign) generate(mpg0 mpg1)
        
        qqplot mpg1 mpg0, ms(Oh) scheme(s1color)
        Click image for larger version

Name:	karla2.png
Views:	1
Size:	24.9 KB
ID:	1485477

        If you have thousands and thousands of values, consider comparing just letter values:

        Code:
        SJ-16-4 st0465  . . . . .  Speaking Stata: Letter values as selected quantiles
                (help lvalues)  . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
                Q4/16   SJ 16(4):1058--1071
                supports calculation of letter values without the 21 letter
                limit of lv and is designed to save results in new variables
                for as many variables and distinct groups as are specified

        Comment


        • #5
          Thanks a lot Nick

          Comment

          Working...
          X