Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Twoway connected scatter plot

    Hello!
    I am trying to display data at two time points, using box plots combined with a scatter plot in which each individual's data at time 1 is connected to their data at time 2.
    Having searched around, I'm fine with the box plots and the scatter plots, but cannot connect each pair of dots.

    I've illustrated below with bplong.
    The problem is there is only one line, apparently connecting the means at the two time points?.

    I've seen xtline referred to but don't understand how to utilize that.
    Thanks for any guidance!
    Tim

    Here's the code:
    sysuse bplong
    drop if patient > 10 // only need small n to illustrate
    sort when

    * Generate parameters for box plot
    by when: egen med = median(bp)
    by when: egen lqt = pctile(bp), p(25)
    by when: egen uqt = pctile(bp), p(75)

    * generate graph
    twoway rbar lqt med when, barw(.5) fcolor(gs12) lcolor(black) || ///
    rbar med uqt when, barw(.5) fcolor(gs12) lcolor(black) || ///
    scatter bp when, graphregion(fcolor(gs15)) mcolor(black) msymbol(o) msize(small) ///
    connect(l) ///
    legend(off) xlabel( 1 "Before" 2 "After") ///
    ytitle(BP)

    Here's the result:
    Click image for larger version

Name:	BPGraph.png
Views:	2
Size:	21.8 KB
ID:	1698938
    Attached Files

  • #2
    Here are some more ideas.

    Code:
    clear 
    
    set scheme s1color 
    sysuse bplong
    
    sort when
    
    * Generate parameters for box plot
    by when: egen med = median(bp)
    by when: egen lqt = pctile(bp), p(25)
    by when: egen uqt = pctile(bp), p(75)
    by when: egen p95 = pctile(bp), p(95)
    by when: egen p5 = pctile(bp), p(5)
    
    
    gen when2 = cond(when == 1, 0.85, 2.15) 
    
    * generate graph
    sort patient when 
    
    twoway rbar lqt med when2, barw(.1) fcolor(gs12) lcolor(black) || ///
    rbar med uqt when2, barw(.1) fcolor(gs12) lcolor(black) || ///
    rspike uqt p95 when2 || rspike p5 lqt when2 || ///
    scatter bp when2 if bp < p5 | bp > p95, ms(Oh) || /// 
    scatter bp when, graphregion(fcolor(gs15)) mcolor(black) msymbol(o) msize(small) ///
    connect(L) ///
    legend(off) xlabel( 1 "Before" 2 "After") ///
    ytitle(BP)
    Click image for larger version

Name:	bp.png
Views:	1
Size:	196.4 KB
ID:	1698947

    Comment


    • #3
      The specific problem with #1 was that your sort order was biting. The data were sorted on when, so the connection was first between points with when == 1 and then between points with when == 2. There is one and only when connection between 1 and 2, when when changes from 1 to 2.

      It's a user's choice but in #2 I exemplified a box plot with whiskers to the 5% and 95% percentiles.

      The Tukey rule to show points separately if and only if they are more than 1.5 IQR away from the nearer quartile seems hard for many people (depending on their role) to follow, to remember or to explain, and my own view is that its original rationale -- largely that of being fairly easy to calculate "by hand" has long since lost its point. Some percentile rule is fine for spotting extreme outliers if they exist. Precedents for such practice are cited in the help for stripplot from SSC.

      Comment


      • #4
        This is spectacular, both in appearance and in content!
        I am most grateful.
        Tim

        Comment


        • #5
          Thanks, but my latest thoughts are that the spaghetti does not help. bplong is fictional, but if your real data are as messy or messier, try a scatter plot with marginal box plots. It won't be worse!

          Comment


          • #6
            See also https://www.stata-journal.com/articl...article=gr0041

            Comment


            • #7
              Thanks--very much like the marginal box plots.
              Low level graph formatting question. My understanding is that, by default, twoway puts two boxplots at x-axis positions =1 and =2. What's the best way to move them closer but without Stata changing the widths of the boxes? I believe there is a complicated interaction between xlabel() and xscale() which I found hard to understand. Might you have guidance?

              Comment


              • #8
                Code:
                when
                has values 1 and 2. I put the box plots at 0.85 and 2.15 so if you want them closer you change the positions to nearer 1 and 2 respectively. If you do that the bar widths will be larger relative to the slightly reduced range.

                Comment


                • #9
                  Got it. Thanks again!

                  Comment

                  Working...
                  X