Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Categorical variables and a scatterplot diagram

    Hello all,

    I'd like to have a scatterplot with a "Max Ratio" and "Min Ratio" per sector (e.g. Agriculture Forestry etc), but with the two points (per sector). To give you a better idea of what I want, I've uploaded what my current twoway scatterplot looks like. Firstly, I'd like the 'x' axis to reflect the different categories (Agriculture, Forestry etc.). Secondly, I'd like for the two points per sector to be underneath each other. So for example, the red dots represent the Wholesale & Retail Sector., but they are far away from each other. I'd like them to be vertically in line with each other.

    Data structure is the following (agriculture and forestry don't have a second line because they have one minimum wage for the entire sector). Also I don't have any values under "ratio" because I simply want the graph to show the values of the different 'x' variables.

    Code:
    ratio    agriculture    forestry    domesticworkers    privatesecurity    wholesaleandretail    taxisector    hospitality    contractcleaners
                            .39                .38         .  3                            .85                     1                         .42                   .46        .48
                                                                .25                .29            .37               .3             .41                   .44
    My coding is as follows:

    Code:
    twoway (scatter ratio agriculture forestry domesticworkers privatesecurity wholesaleandretail taxisector hospitality contractcleaners)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	179.8 KB
ID:	1292482


  • #2
    Some basic misunderstandings here. The basic structure of a scatter plot command in Stata is scatter yvar1 ... yvark xvar with one or more y variables and one x variable (only!). It seems that you have one y variable and several x variables, but Stata only knows to take the last-named variable as x variable, with the bizarre result you show. So, at a minimum, you need a different graph command or need to restructure your data, but let's start at the beginning.

    You evidently have 8 economic sectors

    agriculture
    forestry
    domestic workers
    private security
    wholesale and retail
    taxi
    hospitality
    contract cleaners

    but with that many categories you will struggle to put them readably on the x axis, so I would not do that. Essentially, the imperative of a readable graph trumps the convention of showing a response or outcome on the y axis.

    Absent your data to play with, here is a sandbox showing some technique.

    Code:
    sysuse auto, clear
    set scheme s1color
    
    egen min = min(mpg), by(rep78)
    egen max = max(mpg), by(rep78)
    label def rep78 1 appalling 2 abysmal 3 average 4 admirable 5 amazing
    label val rep78 rep78
    
    twoway rspike min max rep78, horizontal yla(1/5, ang(h) valuelabel noticks) ///
    xtitle("`: var label mpg'")  ///
    || scatter rep78 min, mc(blue) ///
    || scatter rep78 max , mc(blue) legend(off)
    Click image for larger version

Name:	rooney.png
Views:	1
Size:	9.7 KB
ID:	1292489


    Basic points:

    Here I have only 5 categories, not 8, and my names are typically shorter than yours. So, they will go better as horizontal text on the y axis than squeezed somehow on the x axis.

    You can show a range as an interval and/or two (or even one) data point.

    Screenshots are less satisfactory than Stata graphs saved directly as .png and posted as attachments (FAQ Advice Section 12).




    Last edited by Nick Cox; 28 Apr 2015, 05:09.

    Comment

    Working...
    X