Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Labeling lines in graph without the legend

    Dear Statalist:

    I'm trying to make a graph that shows the trend of an agency's hiring practices over 5 years (2018-2022). I want to plot the percent of grades (ranging from 1-15) yearly posted.
    My code for this is ...
    Code:
    tostring year, replace
    gen agyyear= agy + year 
    gen id = _n
    egen number_agyyear=count(id), by(agyyear) // count the number of obs. with an id by agyyear 
    destring year, replace 
    
    foreach n of numlist 1/15 {
    egen sum_grade_count`n'=sum(grade_count`n'), by(agyyear)  // count the number of obs. by each grade by agyyear 
    gen pcagyyear_grade_count`n'=100*sum_grade_count`n'/number_agyyear
    }
    where "agy" is the name of the agency and "pcagyyear_grade_count*" is the percent of grade* jobs posted. I'm trying to plot a line graph of each agency by, for example,
    Code:
    line pcagyyear_grade_count* year if agy=="AG", legend(off)  xtitle("") ylabel(, angle(0))
    I got rid of the legend because it occupies a lot of space with an already packed 15 lines. Instead, what I want to do is put labels beside the lines. I've tried to follow Post#2 in https://www.statalist.org/forums/for...aphs-with-line , but failed as I could not understand the logic. Could anyone help me out with this? Thanks in advance.

    My data looks like the following.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int year byte(grade_count1 grade_count2 grade_count3 grade_count4 grade_count5 grade_count6 grade_count7 grade_count8 grade_count9 grade_count10 grade_count11 grade_count12 grade_count13 grade_count14 grade_count15) str2 agy
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "TR"
    2018 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "TR"
    2018 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 "TR"
    2018 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "DN"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 "TR"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "DN"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "DN"
    2018 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 "DN"
    2018 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 "TR"
    end
    [/CODE]

  • #2
    First off, your code can be simplified. This

    Code:
    tostring year, replace
    gen agyyear= agy + year
    gen id = _n
    egen number_agyyear=count(id), by(agyyear) // count the number of obs. with an id by agyyear
    destring year, replace
    
    foreach n of numlist 1/15 {
    egen sum_grade_count`n'=sum(grade_count`n'), by(agyyear)  // count the number of obs. by each grade by agyyear
    gen pcagyyear_grade_count`n'=100*sum_grade_count`n'/number_agyyear
    }
    
    line pcagyyear_grade_count* year if agy=="AG", legend(off)  xtitle("") ylabel(, angle(0))
    is I think equivalent to this. I refer to the egen function total() as the equivalent function sum() has been undocumented since Stata 9.

    Code:
    bysort agy year: gen number_agyyear = _N // count the number of obs. by agy year
    
    forval n = 1/15 {
    egen sum_grade_count`n; = total(grade_count`n'), by(agy year)  // count the number of obs. by each grade by agyyear
    gen pcagyyear_grade_count`n'=100*sum_grade_count`n'/number_agyyear
    }
    
    line pcagyyear_grade_count* year if agy=="AG", legend(off)  xtitle("") ylabel(, angle(0))
    Your data example doesn't allow testing of this code.

    The question is all about what is often called direct labelling (labeling), identifying lines or other elements by texts within the plot region rather than legend entries.

    I will revisit the 2017 thread linked in #1, simplify to a bare minimum and add more commentary. The data are it seems performance measures for various banks.

    There are 4 variables to plot in a connected line plot against year which runs from 2007 to 2016. So, we are going to add text to the right of the marker for 2016.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year float(DPO_mean DSO_mean DIO_mean CCC_mean)
    2007 57.08471  89.50652  31.17609   63.5979
    2008 52.71099  85.30817  38.95167  71.54886
    2009 52.26665  83.26951  33.05361 64.056465
    2010 48.94011  73.19818  39.60703   63.8651
    2011 49.55682 66.066086   42.8837  59.39296
    2012 44.97477 73.837616  29.16683  58.02968
    2013 39.46252  68.19304  25.82038   54.5509
    2014 41.48348  57.01266 23.771826    39.301
    2015 45.03276 68.085014 23.520824  46.57307
    2016 60.15359  75.83691  32.09263  47.77595
    end
    Just a scheme I like:

    Code:
    set scheme s1color
    I set up colours in advance. I need 4 colours because I have 4 lines to plot. Here (as compared with the earlier thread) I change cyan to magenta, which is lurid, but stands out in my view. I will explain tokenize later.

    Code:
    local colours "red blue black magenta"
    tokenize "`colours'"
    Options include 4 marker symbols (not absolutely essential, but in my view a good idea) and the same colours for line and marker symbols. I need to stretch the x axis scale a bit to give space for the direct labels.

    Code:
    * common options
    local common sort xtitle("") ms(Oh Dh X Sh) lc(`colours') mc(`colours')
    local common `common' legend(off) xsc(r(. 2016.8)) yla(, ang(h))
    local common `common' xla(2007/2016, labsize(medsmall))
    Now here is the tricky part. I need to get positions for explanatory text from 2016 values. So I use summarize to get the mean of each variable for 2016 -- which is a mean of one value, so it's just a way of looking up the value for 2016. I loop over 1 to 4 as well and build up a local macro with a call to a text() option that says where to put text and what the text should be and how to show it. I look up the colours from earlier -- they are in local macros 1 to 4, which is where tokenize put them earlier.

    Code:
    local j = 1
    foreach v in DPO DSO DIO CCC {
         su `v'_mean if year == 2016, meanonly
         local call `call' text(`r(mean)' 2016 "  `v'", color(``j'') place(r))  
         local ++j
    }
    The looping in parallel is explained at https://journals.sagepub.com/doi/pdf...6867X211063415

    Now I can draw my graph.

    Code:
    twoway connected ???_mean year, `common' `call' subtitle(Means (days))

    Here is the code all at once for anyone who wants to play.


    Code:
    clear
    input int year float(DPO_mean DSO_mean DIO_mean CCC_mean)
    2007 57.08471  89.50652  31.17609   63.5979
    2008 52.71099  85.30817  38.95167  71.54886
    2009 52.26665  83.26951  33.05361 64.056465
    2010 48.94011  73.19818  39.60703   63.8651
    2011 49.55682 66.066086   42.8837  59.39296
    2012 44.97477 73.837616  29.16683  58.02968
    2013 39.46252  68.19304  25.82038   54.5509
    2014 41.48348  57.01266 23.771826    39.301
    2015 45.03276 68.085014 23.520824  46.57307
    2016 60.15359  75.83691  32.09263  47.77595
    end
    
    set scheme s1color
    
    local colours "red blue black magenta"
    tokenize "`colours'"
    
    local common sort xtitle("") ms(Oh Dh X Sh) lc(`colours') mc(`colours')
    local common `common' legend(off) xsc(r(. 2016.8)) yla(, ang(h))
    local common `common' xla(2007/2016, labsize(medsmall))
    
    local j = 1
    foreach v in DPO DSO DIO CCC {
         su `v'_mean if year == 2016, meanonly
         local call `call' text(`r(mean)' 2016 "  `v'", color(``j'') place(r))  
         local ++j
    }
        
    twoway connected ???_mean year,  `common' `call'  subtitle(Means (days))
    FWIW, I never (well, hardly ever) write code like this line by line from the outset. Rather, I have a script in a do-file that evolves as I correct errors and change my mind about details.

    Here is the graph.
    Click image for larger version

Name:	direcllabel2.png
Views:	1
Size:	42.4 KB
ID:	1673582




    You have 15 measures and nothing rules out the possibility of a messier graph.
    Last edited by Nick Cox; 14 Jul 2022, 15:49.

    Comment


    • #3
      Thanks so much for the line by line explanation, Nick.
      I now have a sense of what's going on here and tried to replicate your codes to my data by doing
      Code:
      set scheme s1color
      
      local colours "red blue black magenta"
      tokenize "`colours'"
      
      * common options
      local common sort xtitle("") ms(Oh Dh X Sh) lc(`colours') mc(`colours')
      local common `common' legend(off) xsc(r(. 2022.9)) yla(, ang(h))
      local common `common' xla(2018/2022, labsize(medsmall))
      
      local j = 1
      foreach v in pcagyyear_grade_count6 pcagyyear_grade_count7 pcagyyear_grade_count8 pcagyyear_grade_count9 {
           su `v' if year == 2022, meanonly
           local call `call' text(`r(mean)' 2022 "  `v'", color(``j'') place(r))  
           local ++j
      }
      
      twoway connected pcagyyear_grade_count6 pcagyyear_grade_count7 pcagyyear_grade_count8 pcagyyear_grade_count9 year if agy=="AG", `common' `call' subtitle(Means (days))
      Just for the purpose of testing the codes, I used 4 grades (among 15) and applied those to your code with some adjustments (eg., years). However, I'm having problems with adjusting the location of the labels. The following is the best I have so far.
      Graph.gph

      Also, what might be the reason my dataex is not testable? I've ran into this in a couple of other posts and I don't know why sometimes is works and sometimes not. Many thanks in advance.

      Comment


      • #4
        Working backwards:

        Your data example shows only one year and no observations for which AGY == "AG" so although the latter condition could be ignored the data don't allow seeing quite how well the design would work.

        Please give .png examples of Stata graphs as explained at https://www.statalist.org/forums/help#stata 12.4 and 12.5.

        Below is your graph as png

        There are two minor problems. One is that you don't have enough space for your variable names but here I would just use 6 to 10 as text labels. Another is that the subtitle is a carry-over from the application in the 2017 thread.

        There is one major problem. The text is clearly in the wrong place and the reason is presumably that you take means over all 2022 values to find positions but need to specify if agy == "AG"

        Code:
        set scheme s1color
        
        local colours "red blue black magenta"
        tokenize "`colours'"
        
        * common options
        local common sort xtitle("") ms(Oh Dh X Sh) lc(`colours') mc(`colours')
        local common `common' legend(off) xsc(r(. 2022.9)) yla(, ang(h))
        local common `common' xla(2018/2022, labsize(medsmall))
        
        local j = 1
        foreach v in 6 7 8 9 {
             su  pcagyyear_grade_count`v' if year == 2022  & agy=="AG", meanonly
             local call `call' text(`r(mean)' 2022 "  `v'", color(``j'') place(r))  
             local ++j
        }
        
        twoway connected pcagyyear_grade_count6 pcagyyear_grade_count7 pcagyyear_grade_count8 pcagyyear_grade_count9 year if agy=="AG", `common' `call'
        Attached Files

        Comment


        • #5
          #4 should be 6 to 9, not 6 to 10.

          Comment


          • #6
            Many thanks, Nick. This was very helpful. Apologies for the .gph upload. Now I've applied this to 15 other lines and got the following. I'll eventually have to play around more to figure out a way to present this better (eg., collapsing categories, making the labels smaller), but I understand the logic. Thanks!

            Click image for larger version

Name:	graph1.png
Views:	1
Size:	101.9 KB
ID:	1673648


            Comment


            • #7
              It seems that you have data for 2018 but not for this subset. You may have to work at cutting the useless x axis label for 2018.

              Otherwise if your percents are all positive, I would recommend trying logit scale.

              Code:
              twoway function logit(x), range(0.005 0.33)
              shows how logit stretches low values apart relative to higher values over a range from 0.5% to 33%.

              Comment


              • #8
                That's right. Some subsets range from 2018-2021, others 2019-2022, so yes, I have to work on the x axis for subsets. For all agencies aggregated, the graph looks something like...
                Click image for larger version

Name:	graph1.png
Views:	1
Size:	136.4 KB
ID:	1673665

                I tried to minimize the overlapping of letters by "size(tiny)" in the loop. I guess this is the best I have so far. I'll try out the logit scaling for subsets as well. Many thanks!

                Comment


                • #9
                  See also fabplot from the Stata Journal. https://journals.sagepub.com/doi/pdf...6867X211025838

                  Comment


                  • #10
                    The fabplot is incredible, Nick! Just a quick line of command gives something much intuitive. Thanks!

                    Code:
                    ssc install fabplot
                    fabplot line pcagyyear_grade_count12 year, by(agy) yla(20 40 60 80 100) front(connect) xtitle("") ytitle("Change in percent of GS12 by agency") frontopts(lw(thick))
                    Click image for larger version

Name:	graph1.png
Views:	1
Size:	225.7 KB
ID:	1673683

                    Comment

                    Working...
                    X