Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • graph box plot percentage of categorical variabel

    Hi,

    Stata version 13.1 on Windows 7

    I have an hierarchical dataset (Person in zip-code areas(plz)). My variable of interest (foc) is ordinal with 4 categories (0-3). Also I have a variable called osten which is a dummy and divides my respondents into people from West and East of the country.


    My goal: (see photo)
    What I would like to end up with, is one graph with to box-plots for osten=0 and osten=1 which shows the percentage of people in the categories 2 and 3 of foc on the zip-code level.

    Normally I would use the -collapse- command: collapse foc osten, by(plz)

    I know that I have different options with the collapse command (mean, max, min, p1 p50 etc) but I am not able (like in SPSS with the pgt (percentage great than x)-option in the AGGREGATE-command) to save a new variable on the zip-code level which represents the percentages of the two highest categories of my dependent variable.

    Any suggestions?


  • #2
    This problem has, I think, the same structure as yours.

    Code:
    sysuse citytemp
    egen pc_high = mean(100 * (tempjuly > 75 & tempjuly < . )), by(region division)
    egen tag = tag(region division)
    graph box pc_high if tag, over(region)
    The box plots look a bit odd in this case, but the principle is I think sound.

    The % of values in a group is 100 times the mean of an indicator of being within that group. egen prefers the syntax of the mean of 100 times an indicator, but that's between friends.

    The tagging technique is just to select the summaries once for each combination, not repeatedly.

    See also http://www.stata.com/support/faqs/da...ary-variables/ for a write-up.

    I'd prefer a different display here, but that's a different story.
    Last edited by Nick Cox; 06 May 2014, 08:55.

    Comment


    • #3
      Dear Nick,
      thanks to your your quick answer I was able to create the graph I needed.

      Comment


      • #4
        After another thought I only used parts of Nicks suggested code:

        remember form my first post:
        foc - > categorical 0-3
        osten -> dummy 0/1 (West/East of country)
        plz -> zip-code identifier

        goal: distribution of percent of people in category 2 & 3 in zip-code areas split by osten

        Code:
        egen pc_high = mean(100*(foc>=2 & foc<.)), by(plz)
        collapse pc_high osten, by(plz)
        the graph I got by the following code was my intended goal
        Code:
        graph box pc_high, over(osten) nooutsides

        Comment

        Working...
        X