Stripplots

Nitish Upadhyaya

Join Date: Jan 2022

Posts: 10
#1

Stripplots

23 Jan 2022, 05:21

Second test post:

Hi

I am trying to move past traditional box and whisker plots given their limitations, and instead want to visualise my data using a stripplot. I am using Stata 16.1 and am very much a novice (although I have tried my best to work through the stripplot guidance).

Background

In terms of context, I have asked my participants to play a game. They are split into either condition 0 or condition 1. In each game, they can choose either to cooperate (responsedummy=1) or not to cooperate (responsedummy=0). I have also collected information about various personality traits which are each reported on a scale of 0-5 (extracted below in the dataex as an example are openness and agreeableness, but I have two further traits to analyse). I am trying to understand how specific levels of traits affect cooperation.

What I have tried so far

I started with a traditional plot using the following (I appreciate it is messy and has many pitfalls, and I could cut the visualise the data in many ways, hence my desire to investigate stripplots):

graph box openness agreeableness extraversion honestyhumility, over(condition) over(responsedummy) ylabel(, angle(horizontal))

But I am now exploring stripplots. I appreciate that the over() command can only be used with a single variable. I am comfortable producing a focused analysis e.g. this for openness:

stripplot openness, over(responsedummy) box(barw(0.8) blcolor(ltblue)) jitter(3) centre vertical cumul cumpr mc(orange) scheme(s1color) yla(, ang(h))
I have also been exploring options such as

stripplot openness, over(responsedummy) box vertical stack h(0.4)
stripplot openness, over(responsedummy) box blcolor(white) iqr jitter(3) stack h(0.5) vertical mcolor(blue)

What I am hoping for some guidance on

I am trying to understand how personality traits affect cooperation. I will be running a probit model on the data set so what I am trying to do here is just give the reader as accurate a visual as possible in terms of the descriptive statistics.

I can't quite seem to get the hang of the by() command mentioned in posts such as this one.

Ideally I would like to create a stripplot which shows each personality trait side by side (like I have done with the box plot) for one condition, and then broken down by non-cooperation/cooperation. I can then replicate for the other condition.

I'd like to then be able to have another stripplot showing cooperation and all personality traits, and then non-cooperation and personality traits.

Once I get the basics, I'm hoping i'll be able to run with it! Many thanks in advance for any help.

Dataex

Code:

* Example generated by -dataex-. For more info, type help dataex clear input long id float(openness agreeableness condition responsedummy) 5460439 2.5 3.3 0 1 5460448 2.6 2.6 1 1 5460477 2.6 2 0 1 5460525 3.5 2.4 0 1 5460546 3.3 3.2 1 1 5460586 4.6 4.2 1 1 5460576 3.5 3 1 1 5460556 3.9 2.9 0 1 5460553 2.5 4.1 0 1 5460558 2.7 2.5 1 1 5460550 4.3 3 0 0 5460557 3.5 2.8 1 0 5460540 3.9 2.8 0 0 5460521 3.9 2.3 1 0 5460541 4.4 3.1 0 0 5460520 3.7 2.4 0 0 5460522 2.8 3.1 0 1 5460517 4.6 4.1 1 0 5460524 2.8 2.5 1 1 5460518 2.6 3.1 1 0 5460624 2.5 3.4 1 0 5460627 4 3.4 0 0 5460516 3.9 2.3 1 0 5460505 3.8 3.3 0 0 5460503 2.9 3.7 1 0 5460506 4.4 3.4 0 1 5460502 2.7 2.5 1 1 5460491 4.2 3.6 1 0 5460658 2.7 4 1 1 5460495 4.6 3.9 1 0 5460484 4 4.2 0 1 5460482 3.4 2.9 0 0 5460485 3.1 3.7 1 1 5460483 3.2 3.3 0 1 5460471 3.3 2.9 1 1 5460469 3.3 2.8 0 0 5460459 3 3.8 0 1 5460449 3.6 3.5 1 1 5460442 3.2 2.9 1 1 5460691 4.4 2.9 0 1 5460438 3 2.7 1 0 5460656 4.6 3.9 0 0 5460435 3.9 3.9 0 1 5460715 3.5 3.6 1 0 5460433 3.8 2.6 0 1 5460709 2.6 4.6 0 1 5460431 3.5 3.9 1 0 5460589 3.7 2.8 1 0 5460569 3.5 3.5 0 1 5460600 3.8 3.5 1 1 5460580 4.1 3.2 0 0 5460590 3.7 2.3 1 1 5460551 3.1 3.3 1 0 5460598 4.2 4.5 1 1 5460593 4.3 3.4 1 0 5460584 3.3 2.8 0 1 5460578 4.2 3.3 0 1 5460564 3.2 1.6 1 0 5460574 3.9 2.9 0 0 5460571 3.9 2.6 1 1 5460555 4.1 3.6 1 0 5460563 4.5 2.9 0 1 5460607 4 3.8 0 0 5460605 3 2.8 0 0 5460601 2.9 2.4 0 1 5460616 3.6 3.8 0 1 5460623 3.4 3.2 0 1 5460625 3.8 3.3 0 1 5460615 4.5 3.2 0 1 5460642 3.5 3.2 1 0 5460626 4.1 2.5 1 1 5460695 3 3.5 1 0 5460721 4.3 4.3 1 1 5460717 4.2 2.8 1 0 5460719 3 3.9 0 1 5460722 3.8 3.8 1 0 5460735 3.9 3.6 1 0 5460730 4.1 3.8 0 1 5460710 3.5 3 0 1 5460716 2.9 3.3 1 0 5460737 3.7 3 1 1 5460734 3.9 3.4 0 0 5460741 4.3 3.8 1 0 5460753 4.8 3.4 0 1 5460789 3.3 2.7 0 1 5460635 4.1 3.6 1 1 5460638 3.8 2.9 1 1 5460640 3.2 1.3 0 1 5460746 3.9 3 1 1 5460777 3.6 3.3 0 0 5460778 3.5 2.7 1 0 5460650 2.6 3.5 0 0 5460644 2.8 3.1 1 1 5460651 2.7 3.1 1 0 5460649 2.1 3.9 0 0 5460661 4.4 2.6 1 0 5460660 3 4.4 1 0 5460647 3.2 3.5 1 1 5460671 4 4.3 0 1 5460652 4 2.2 0 1 end

Last edited by Nitish Upadhyaya; 23 Jan 2022, 05:24.
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35211

23 Jan 2022, 07:36

stripplot is from SSC as you are asked to explain (FAQ Advice #12).

Thanks for the data example. I have various small and large points.

jitter() is legal but a bad idea together with the cumul option. It undoes much of the good that cumul does in segregating values.

Never mess with the width of any median, quartiles box when cumul cumpr is applied with box -- as the box width has concrete meaning as stretching from cumulative probability 0.25 to cumulative probability 0.75. That's the point: a quantile plot and a box plot are compatible because the median and quartiles can be shown on both and because the horizontal scale of the quantile plot is cumulative probability.

Otherwise here's some technique. Wanting to show everything at once seems ambitious to me.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id float(openness agreeableness condition responsedummy)
5460439 2.5 3.3 0 1
5460448 2.6 2.6 1 1
5460477 2.6   2 0 1
5460525 3.5 2.4 0 1
5460546 3.3 3.2 1 1
5460586 4.6 4.2 1 1
5460576 3.5   3 1 1
5460556 3.9 2.9 0 1
5460553 2.5 4.1 0 1
5460558 2.7 2.5 1 1
5460550 4.3   3 0 0
5460557 3.5 2.8 1 0
5460540 3.9 2.8 0 0
5460521 3.9 2.3 1 0
5460541 4.4 3.1 0 0
5460520 3.7 2.4 0 0
5460522 2.8 3.1 0 1
5460517 4.6 4.1 1 0
5460524 2.8 2.5 1 1
5460518 2.6 3.1 1 0
5460624 2.5 3.4 1 0
5460627   4 3.4 0 0
5460516 3.9 2.3 1 0
5460505 3.8 3.3 0 0
5460503 2.9 3.7 1 0
5460506 4.4 3.4 0 1
5460502 2.7 2.5 1 1
5460491 4.2 3.6 1 0
5460658 2.7   4 1 1
5460495 4.6 3.9 1 0
5460484   4 4.2 0 1
5460482 3.4 2.9 0 0
5460485 3.1 3.7 1 1
5460483 3.2 3.3 0 1
5460471 3.3 2.9 1 1
5460469 3.3 2.8 0 0
5460459   3 3.8 0 1
5460449 3.6 3.5 1 1
5460442 3.2 2.9 1 1
5460691 4.4 2.9 0 1
5460438   3 2.7 1 0
5460656 4.6 3.9 0 0
5460435 3.9 3.9 0 1
5460715 3.5 3.6 1 0
5460433 3.8 2.6 0 1
5460709 2.6 4.6 0 1
5460431 3.5 3.9 1 0
5460589 3.7 2.8 1 0
5460569 3.5 3.5 0 1
5460600 3.8 3.5 1 1
5460580 4.1 3.2 0 0
5460590 3.7 2.3 1 1
5460551 3.1 3.3 1 0
5460598 4.2 4.5 1 1
5460593 4.3 3.4 1 0
5460584 3.3 2.8 0 1
5460578 4.2 3.3 0 1
5460564 3.2 1.6 1 0
5460574 3.9 2.9 0 0
5460571 3.9 2.6 1 1
5460555 4.1 3.6 1 0
5460563 4.5 2.9 0 1
5460607   4 3.8 0 0
5460605   3 2.8 0 0
5460601 2.9 2.4 0 1
5460616 3.6 3.8 0 1
5460623 3.4 3.2 0 1
5460625 3.8 3.3 0 1
5460615 4.5 3.2 0 1
5460642 3.5 3.2 1 0
5460626 4.1 2.5 1 1
5460695   3 3.5 1 0
5460721 4.3 4.3 1 1
5460717 4.2 2.8 1 0
5460719   3 3.9 0 1
5460722 3.8 3.8 1 0
5460735 3.9 3.6 1 0
5460730 4.1 3.8 0 1
5460710 3.5   3 0 1
5460716 2.9 3.3 1 0
5460737 3.7   3 1 1
5460734 3.9 3.4 0 0
5460741 4.3 3.8 1 0
5460753 4.8 3.4 0 1
5460789 3.3 2.7 0 1
5460635 4.1 3.6 1 1
5460638 3.8 2.9 1 1
5460640 3.2 1.3 0 1
5460746 3.9   3 1 1
5460777 3.6 3.3 0 0
5460778 3.5 2.7 1 0
5460650 2.6 3.5 0 0
5460644 2.8 3.1 1 1
5460651 2.7 3.1 1 0
5460649 2.1 3.9 0 0
5460661 4.4 2.6 1 0
5460660   3 4.4 1 0
5460647 3.2 3.5 1 1
5460671   4 4.3 0 1
5460652   4 2.2 0 1
end

set scheme s1color 

local opts vertical yla(, ang(h)) mc(black) ms(Sh)

stripplot openness, over(responsedummy) box(blcolor(blue)) centre cumul cumpr   `opts' name(G1, replace)

stripplot openness, over(responsedummy)  box(blcolor(blue)  barw(0.06)) pctile(0) whiskers(lc(blue)) boffset(-0.1) `opts' stack h(0.4) name(G2, replace)


local toshow 

foreach v in openness agreeableness { 
    separate `v', by(responsedummy) veryshortlabel 
    local toshow `toshow' `r(varlist)'
}

stripplot `toshow', box(blcolor(blue)) centre cumul cumpr   `opts' name(G3, replace) xaxis(1 2) ///
xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("") 

*  need better value labels! 
label def condition 0 "condition 0" 1 "condition 1"
label val condition condition 

stripplot `toshow', box(blcolor(blue)) by(condition, note("")) centre cumul cumpr   `opts' name(G4, replace) xaxis(1 2) ///
xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("") subtitle(, nobox fcolor(none))

Comment

Nitish Upadhyaya

Join Date: Jan 2022
Posts: 10

24 Jan 2022, 01:45

Hi Nick Cox

Thank you so much for taking the time to response in such a detailed manner. I totally understand your points on jitter() and being careful with widths when cumul cumpr is applied. I also think you are right that trying to show everything at once is a bit too ambitious, especially as the reader will likely be overloaded. As such, I am going to go for the slightly simpler stripplots example you gave for each individual trait. I'd taken the clearer value labels out for posting on the forum.

I haven't, however, been able to make the below work for me - I keep getting a "local not found error" which I haven't been able to puzzle out. I know it'll be something silly I am doing on my end(see extra from stata below). Would be good to use this opportunity to learn for next time.

Originally posted by Nick Cox View Post

Code:

local toshow

foreach v in openness agreeableness {
separate `v', by(responsedummy) veryshortlabel
local toshow `toshow' `r(varlist)'
}

stripplot `toshow', box(blcolor(blue)) centre cumul cumpr `opts' name(G3, replace) xaxis(1 2) ///
xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("")

* need better value labels!
label def condition 0 "condition 0" 1 "condition 1"
label val condition condition

stripplot `toshow', box(blcolor(blue)) by(condition, note("")) centre cumul cumpr `opts' name(G4, replace) xaxis(1 2) ///
xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("") subtitle(, nobox fcolor(none))

Stata output when running highlighted portion of above

Code:

. local toshow

.
. foreach v in openness agreeableness {
  2.     separate `v', by(responsedummy) veryshortlabel
  3.     local toshow `toshow' `r(varlist)'
  4. }

              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
openness0       float   %9.0g                 0
openness1       float   %9.0g                 1

              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
agreeableness0  float   %9.0g                 0
agreeableness1  float   %9.0g                 1

.
. stripplot `toshow', box(blcolor(blue)) centre cumul cumpr   `opts' name(G3, replace) xaxis(1 2) ///
> xla(1 "0" 2 "1" 3 "0" 4 "1", tlcolor(bg) axis(2)) xla(1.5 "openness" 3.5 "agreeableness", tlcolor(bg) axis(1)) ///
> xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("")
local not found
r(111);

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35211
#4

24 Jan 2022, 03:15

See https://www.stata-journal.com/articl...article=dm0102 on local macros. My guess is that you need to run the script as a whole, not one line of code at a time from the Do-file Editor window.
Comment
Nitish Upadhyaya

Join Date: Jan 2022

Posts: 10
#5

26 Jan 2022, 11:47

Cheers Nick Cox - it isn't quite working for me but I've decided, as you flagged, to pare back what I am trying to show in each diagram and therefore your guidance on the initial piece has come in very useful.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35211

26 Jan 2022, 19:00

The main idea is that the script should go in the do-file Editor window and then you run the entirety of the code at once.

It does work! Here for example is the last graph from #2. My concern is just whether if you show 4 variables not just openness and agreeableness, it gets too crowded.

The code would be something like

Code:

local toshow

foreach v in openness agreeableness extraversion honestyhumility {      
    separate `v', by(responsedummy) veryshortlabel      
    local toshow `toshow' `r(varlist)' 
}  

stripplot `toshow', box(blcolor(blue)) centre cumul cumpr   `opts' name(G5, replace) xaxis(1 2) /// 
xla(1 "0" 2 "1" 3 "0" 4 "1" 5 "0" 6 "1" 7 "0" 8 "1", tlcolor(bg) axis(2)) ///
xla(1.5 "openness" 3.5 "agreeableness" 5.5 "extraversion" 7.5 "honesty-humility", tlcolor(bg) axis(1)) /// 
xtitle(response, axis(2)) xline(2.5, lc(gs8) lw(thin)) ytitle("")

and so on.

Comment

Nitish Upadhyaya

Join Date: Jan 2022

Posts: 10
#7

30 Jan 2022, 01:36

Cheers Nick Cox for coming back on this. I had been running the entirety of the code at one (and just tried your additiona suggestion) but I am still getting a "local not found" error. I clearly have something set up wrong on my end - any ideas so I can bottom this out for the future? And I couldn't see the screen capture in your above post either. Really appreciate your patience with this!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35211
#8

30 Jan 2022, 05:09

Show the result of

Code:

which stripplot

Before the stripplot call put

Code:

set trace on

and then show us a chunk of the code trace before the command fails.
Comment

Eddy Simms

Join Date: Dec 2019
Posts: 38

30 Jan 2022, 08:34

Nick,
I can reproduce the error:

Code:

. which stripplot
*! 2.9.0 NJC 10 July 2021

Code:

 if r(N) {
      local s = cond(r(N)>1,"s","")
      local N : di %11.0fc r(N)
      local N `N'
      di in bl "(`N' missing value`s' generated)"
      }
      }
    - rename `dummy' `name'
    = rename __00000E __000006
    --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- end egen ---
  - if `"`tufte'`tufte2'"' != local pctile = 0
  = if `""' != local pctile = 0
local not found
    if "`iqr'`iqr2'" != "" {
    egen `upper' = max(cond(`data' <= `upq' + `mult' * (`upq' - `loq'), `data', .)), by(`by' _stack)
    egen `lower' = min(cond(`data' >= `loq' - `mult'

Code:

    }
    if "`cumulate'" != "" {
    tempvar count negstack
    gen `negstack' = -_stack
    sort `by' `negstack' `data' `separate', stable
    if "`cumprob'" != "" {
    by `by' `negstack' : gen `count' = (_n - 0.5)/_N
    }
    else by `by' `negstack' : gen `count' = _n
    su `count', meanonly
    if "`centre'`center'" != "" {
    if "`cumprob'" != "" {
    by `by' `negstack' : replace `count' = `count' - 0.5
    }
    else by `by' `negstack' : replace `count' = _n - (_N + 1)/2
    }
    replace _stack = _stack + `height' * `count' / r(max)
    }
    local which "`copystack'"
    }
    }
  ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ end stripplot ---
r(111);

end of do-file

Stata 17

Eddy

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35211
#10

30 Jan 2022, 10:29

Thanks very much to :@Eddy SImms. That's very helpful. You've been bitten by a bug in stripplot 2.9.0 -- and I've been able to work out why it did not bite me. Sorry about that.

The code should be

Code:

if `"`tufte'`tufte2'"' != "" local pctile = 0

if you are willing to edit the .ado file.

Otherwise If you want to use a private message on Statalist revealing an email I can send you revised code directly. Otherwise a revised copy will be posted on SSC, at Kit Baum's convenience.

Last edited by Nick Cox; 30 Jan 2022, 10:31.
1 like
Comment
Eddy Simms

Join Date: Dec 2019

Posts: 38
#11

30 Jan 2022, 10:46

Thank you very much. Edited the code and everything works.

Eddy
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment