graph box plot percentage of categorical variabel

Julian Pritsch

Join Date: Apr 2014

Posts: 80
#1

graph box plot percentage of categorical variabel

06 May 2014, 07:02

Hi,

Stata version 13.1 on Windows 7

I have an hierarchical dataset (Person in zip-code areas(plz)). My variable of interest (foc) is ordinal with 4 categories (0-3). Also I have a variable called osten which is a dummy and divides my respondents into people from West and East of the country.

My goal: (see photo)
What I would like to end up with, is one graph with to box-plots for osten=0 and osten=1 which shows the percentage of people in the categories 2 and 3 of foc on the zip-code level.

Normally I would use the -collapse- command: collapse foc osten, by(plz)

I know that I have different options with the collapse command (mean, max, min, p1 p50 etc) but I am not able (like in SPSS with the pgt (percentage great than x)-option in the AGGREGATE-command) to save a new variable on the zip-code level which represents the percentages of the two highest categories of my dependent variable.

Any suggestions?

1 Photo
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35211
#2

06 May 2014, 08:16

This problem has, I think, the same structure as yours.

Code:

sysuse citytemp egen pc_high = mean(100 * (tempjuly > 75 & tempjuly < . )), by(region division) egen tag = tag(region division) graph box pc_high if tag, over(region)

The box plots look a bit odd in this case, but the principle is I think sound.

The % of values in a group is 100 times the mean of an indicator of being within that group. egen prefers the syntax of the mean of 100 times an indicator, but that's between friends.

The tagging technique is just to select the summaries once for each combination, not repeatedly.

See also http://www.stata.com/support/faqs/da...ary-variables/ for a write-up.

I'd prefer a different display here, but that's a different story.

Last edited by Nick Cox; 06 May 2014, 08:55.
Comment
Julian Pritsch

Join Date: Apr 2014

Posts: 80
#3

06 May 2014, 09:45

Dear Nick,
thanks to your your quick answer I was able to create the graph I needed.
Comment
Julian Pritsch

Join Date: Apr 2014

Posts: 80
#4

06 May 2014, 10:28

After another thought I only used parts of Nicks suggested code:

remember form my first post:
foc - > categorical 0-3
osten -> dummy 0/1 (West/East of country)
plz -> zip-code identifier

goal: distribution of percent of people in category 2 & 3 in zip-code areas split by osten

Code:

egen pc_high = mean(100*(foc>=2 & foc<.)), by(plz) collapse pc_high osten, by(plz)

the graph I got by the following code was my intended goal

Code:

graph box pc_high, over(osten) nooutsides
Comment

Announcement

graph box plot percentage of categorical variabel

Comment

Comment

Comment