Dear Stata-users,
Using Stata18 on Windows 10, I would like to create a descriptive stripplot display of clustered data both over and by (two layers of) categories. I want to represent each datapoint as a dot, overlayed by a box plot.
There are 21 units of observation ("id"), numerically encoded 1 to 21. A second variable "idstr" is the same but encoded as string type data. There are 4 observations of the Y variable for each id. The variable "type" is the exposure of interest over which to compare. There are three categories of "type", numerically encoded as 0/1/2.
The data is in long form. Dataex, code and noted problems to solve below.
Graph 2 and combined Graphs 7-9 are the closest to what I need.
I would be grateful for any insights/ suggestions. Thankyou!
Janine
// compare type groups for Y (not taking into account clustering): Graph 1
stripplot Y, over(type) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g1, replace)
// single compact graph of Y data over "id" by "type": Graph 2
stripplot Y, over(id) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g2, replace)
/* Problem:
1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories
*/
// single compact graph of Y data over "idstr" by "type": Graph 3
stripplot Y, over(idstr) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g3, replace)
/* Problems:
1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories
2. the x-axis labels 1 to 21 are not ordered from 1 to 21
*/
// combine three graphs for each "type" over "idstr" : combined Graphs 4-6
stripplot Y if type == 0, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g4, replace)
stripplot Y if type == 1, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g5, replace)
stripplot Y if type == 2, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g6, replace)
graph combine g4 g5 g6, row(1)
/* Problems:
1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories
2. The graphs are separate rather than compact
3. the graphs are scaled inconsistently
*/
// combine three graphs for each "type" over "idstr" : combined Graphs 7-9
stripplot Y if type == 0, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g7, replace)
stripplot Y if type == 1, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g8, replace)
stripplot Y if type == 2, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g9, replace)
graph combine g7 g8 g9, row(1)
/* Problems:
1. the x-axis labels are not ordered numerically
2. The graphs are separate rather than compact
3. the graph with the least id categories is scaled inconsistently compared to the graphs with more numerous id categories
4. the x title displays the incorrect title "id" instead of "idstr"
*/
Using Stata18 on Windows 10, I would like to create a descriptive stripplot display of clustered data both over and by (two layers of) categories. I want to represent each datapoint as a dot, overlayed by a box plot.
There are 21 units of observation ("id"), numerically encoded 1 to 21. A second variable "idstr" is the same but encoded as string type data. There are 4 observations of the Y variable for each id. The variable "type" is the exposure of interest over which to compare. There are three categories of "type", numerically encoded as 0/1/2.
The data is in long form. Dataex, code and noted problems to solve below.
Graph 2 and combined Graphs 7-9 are the closest to what I need.
I would be grateful for any insights/ suggestions. Thankyou!
Janine
Code:
* Example generated by -dataex-. For more info, type help dataex clear input byte id str2 idstr byte sub_id double Y byte type 1 "1" 1 .07488189048603522 2 1 "1" 2 .5067875067108304 2 1 "1" 3 .29499822792932273 2 1 "1" 4 .018557113586428975 2 2 "2" 1 .4113476486861821 0 2 "2" 2 .9195005728815538 0 2 "2" 3 .5432439113061359 0 2 "2" 4 .41028091230066566 0 3 "3" 1 .9072961712824079 0 3 "3" 2 .8448987600966915 0 3 "3" 3 .26780287305047035 0 3 "3" 4 .27379354098684305 0 4 "4" 1 .966479198040105 0 4 "4" 2 .6341329194538046 0 4 "4" 3 .8212810231346063 0 4 "4" 4 .23324948596647577 0 5 "5" 1 .07873996407893169 1 5 "5" 2 .677842755864262 1 5 "5" 3 .11788779304927188 1 5 "5" 4 .9637632250605132 1 6 "6" 1 .08891648492100579 1 6 "6" 2 .7266541288236086 1 6 "6" 3 .5956550213456991 1 6 "6" 4 .11259534194901744 1 7 "7" 1 .48240122063694946 1 7 "7" 2 .34795262613972267 1 7 "7" 3 .8433304450260144 1 7 "7" 4 .4317569130636417 1 8 "8" 1 .2937572250893631 2 8 "8" 2 .9510877785738538 2 8 "8" 3 .7105466649350736 2 8 "8" 4 .005007799058933338 2 9 "9" 1 .6042019611288234 1 9 "9" 2 .11513986519583042 1 9 "9" 3 .1983651475242354 1 9 "9" 4 .06401639069526355 1 10 "10" 1 .7604807247137322 1 10 "10" 2 .5340250780305599 1 10 "10" 3 .7143345464844253 1 10 "10" 4 .9449475199789127 1 11 "11" 1 .5347245697100783 1 11 "11" 2 .8283370474700306 1 11 "11" 3 .4127192172926123 1 11 "11" 4 .958545345904852 1 12 "12" 1 .838533152932602 1 12 "12" 2 .2455713597533643 1 12 "12" 3 .44871007570076915 1 12 "12" 4 .9175279017503324 1 13 "13" 1 .5574824506486208 1 13 "13" 2 .4238900581338134 1 13 "13" 3 .22908874233425924 1 13 "13" 4 .9974159465045283 1 14 "14" 1 .05477617525407785 2 14 "14" 2 .3230299731492071 2 14 "14" 3 .17531699588470107 2 14 "14" 4 .6094178721405584 2 15 "15" 1 .6385848074664882 2 15 "15" 2 .8899438910415776 2 15 "15" 3 .44820822217793554 2 15 "15" 4 .2629934481093047 2 16 "16" 1 .6043580459007394 2 16 "16" 2 .2597256465301314 2 16 "16" 3 .37404740302320694 2 16 "16" 4 .1885804343837293 2 17 "17" 1 .03545208904574093 2 17 "17" 2 .0817467913998039 2 17 "17" 3 .8439653489601434 2 17 "17" 4 .04956033989221853 2 18 "18" 1 .3722326177886752 2 18 "18" 2 .9846106419931462 2 18 "18" 3 .03563558815406753 2 18 "18" 4 .5738691024526158 2 19 "19" 1 .3330087987974606 2 19 "19" 2 .9444268942414267 2 19 "19" 3 .2474485598948324 2 19 "19" 4 .7479053152466856 2 20 "20" 1 .397547867876096 0 20 "20" 2 .44053960989013097 0 20 "20" 3 .2801748489806747 0 20 "20" 4 .34413739004932964 0 21 "21" 1 .19849761919324282 2 21 "21" 2 .9173630043593013 2 21 "21" 3 .8280002046875402 2 21 "21" 4 .285534958188877 2 end
stripplot Y, over(type) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g1, replace)
// single compact graph of Y data over "id" by "type": Graph 2
stripplot Y, over(id) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g2, replace)
/* Problem:
1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories
*/
// single compact graph of Y data over "idstr" by "type": Graph 3
stripplot Y, over(idstr) by(type, xrescale compact row(1) ) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g3, replace)
/* Problems:
1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories
2. the x-axis labels 1 to 21 are not ordered from 1 to 21
*/
// combine three graphs for each "type" over "idstr" : combined Graphs 4-6
stripplot Y if type == 0, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g4, replace)
stripplot Y if type == 1, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g5, replace)
stripplot Y if type == 2, over(id) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g6, replace)
graph combine g4 g5 g6, row(1)
/* Problems:
1. the x-axis is the same for all graphs (1-21) doesn't rescale to display the unequal number of id categories
2. The graphs are separate rather than compact
3. the graphs are scaled inconsistently
*/
// combine three graphs for each "type" over "idstr" : combined Graphs 7-9
stripplot Y if type == 0, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g7, replace)
stripplot Y if type == 1, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("")ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g8, replace)
stripplot Y if type == 2, over(idstr) by(type, xrescale compact row(1) note("")) vertical box(bfcolor(gs11) blcolor(gs1)) pct(0.1) ytitle("mock outcome") title("") ytitle("") ylab(,angle(horizontal) nogrid) xlab( , nogrid) jitter(3) mcolor(blue) name(g9, replace)
graph combine g7 g8 g9, row(1)
/* Problems:
1. the x-axis labels are not ordered numerically
2. The graphs are separate rather than compact
3. the graph with the least id categories is scaled inconsistently compared to the graphs with more numerous id categories
4. the x title displays the incorrect title "id" instead of "idstr"
*/
Comment