Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two way pyramid graph using countries instead of range

    Hello! I'm trying so hard to replicate a graph like this, it's some sort of a pyramid but it is not divided according to ranges, each horizontal bar belongs to one country using two different variables.

    When I use twoway bar var1 country, horizontal || bar var2 country , I get the following error "string variables not allowed in varlist".

    Any thought?
    Click image for larger version

Name:	Captura de Pantalla 2022-12-14 a la(s) 20.18.43.png
Views:	1
Size:	80.0 KB
ID:	1693621

  • #2
    I assume the country variable is a string. You could try something like:
    Code:
    encode country, gen(id)
    twoway bar var1 id, horizontal || bar var2 id , horizontal

    Comment


    • #3
      Not the question, but I wouldn't use a pyramid at all for these data. For me -- apart from Egyptian prototypes -- the term pyramid mostly conjures up diagrams used to portray the age and sex structure of populations. These remain popular and easy to understand in principle, but they are not especially effective in practice beyond showing age structure, which is only part of the story.

      In particular, small variations in the difference or ratio of females and males at different ages are often both interesting and important, but reading them off the graph is too hard, as that requires the reader to pick up each bar mentally and flip it over to the other side. I defy anyone to be able to do that precisely and accurately, as what is needed often is to discern shifts of the order of 1%. Needing some device like placing a rule(r) against a screen or printed example, or needing to look at a table, makes my point for me. Going down to the magnitude axis and estimating each number again and again is also tedious and pointless when there are better alternatives.

      Here's a post saying that at greater length -- with data (!) and graph (!!) and a scholarly reference (!!!). Anyone able to double my reference list would earn my gratitude.

      https://www.statalist.org/forums/for...oway-bar-graph

      As Tom Sawyer (really Mark Twain) said in another context applies here

      Often, the less there is to justify a traditional custom, the harder it is to get rid of it.
      Now the data in #1 aren't age, sex and number, but data like that in #1 surely share the problem. I would recommend graph dot with marker(1, ms(Oh)) marker(2, ms(+))


      Comment


      • #4
        Originally posted by Scott Merryman View Post
        I assume the country variable is a string. You could try something like:
        Code:
        encode country, gen(id)
        twoway bar var1 id, horizontal || bar var2 id , horizontal
        That kind of worked but I couldn't get to put one variable on each side. And is there any graph I can't make without using the y-axis as a numerical variable but using it to refer to a specific observation? Like the first graph


        Click image for larger version

Name:	Captura de Pantalla 2022-12-15 a la(s) 17.46.53.png
Views:	1
Size:	101.7 KB
ID:	1693756

        Comment


        • #5
          You need to negate one variable to get it shown to the left of the vertical axis.

          Comment


          • #6
            You need to set one off the values as a negative number. For example

            Code:
            sysuse census.dta,clear
            
            keep state2 marriage divorce
            encode state2, gen(id)
            keep if id <= 20
            gen divorce2 = -divorce
            
            sum marriage
            gen x = r(max)*1.2
            
            twoway bar marriage id, horiz barwidth(.5)   /// 
                || bar divorce2 id, horiz barwidth(.5)  ///
                || scatter id x, mlabel(state2) msymbol(none)  mlabsize(*1.1) mlabcolor(black) /// 
                || ,legend(pos(3) order(1 "M" 2 "D"))  yscale(off) ylabel(,nogrid)  /// 
                    xlabel(-100000 "100000" 0 100000 200000) xscale( range(300000))
            Click image for larger version

Name:	Graph.png
Views:	1
Size:	40.5 KB
ID:	1693769

            Comment


            • #7
              As an extension of #2 let's take the sandbox data used constructively by Scott Merryman in #6 and show some alternative technique.

              graph dot is very tlexible, although twoway can be even more flexible at a small cost of a little more work.

              Code:
              sysuse census.dta,clear
              
              sort state2
              keep in 1/20 
              sort marriage 
              gen id = _n
              
              * labmask is from the Stata Journal and needs to be installed before it can be used 
              labmask id, values(state2)
              
              scatter id divorce, ms(+) || scatter id marriage, ms(Oh) ///
              xsc(log) xla(1000 3000 10000 30000 100000 300000) xsc(alt) ///
              yla(1/20, valuelabel ang(h) noticks grid) yaxis(1 2) ///
              yla(1/20, valuelabel ang(h) noticks axis(2)) ytitle("", axis(2)) ytitle("") ///
              legend(order(1 "divorces" 2 "marriages"))

              What are the key points? See https://www.stata-journal.com/articl...article=gr0034 for ruminations on essentially this theme.


              Click image for larger version

Name:	marriages_divorces.png
Views:	1
Size:	24.7 KB
ID:	1693817


              1. A (Cleveland) dot chart as provided by graph dot or just produced here by twoway scatter avoids the request (or obligation) to compare bars on opposite sides of the graph. Just use markers that tolerate overlap or occlusion as much as possible.

              2. Axis labels on both sides may not appeal; the point is show that you can have that if it is helpful.

              3. The idea is compatible with logarithmic (or even logit, square root, ...) scales should they seem helpful -- or even essential.

              4. If you no longer follow a multitude to draw bar charts as an unthinking default, then you have more flexibility and need not show zero. Conversely, the bar charts in #4 really need a zero base on the evidence provided.

              5. Whenever a graph has table flavour, I tend to put the horizontal axis at the top. That is a choice, and yours may differ. But see https://www.stata-journal.com/articl...article=gr0053 if you seek more discussion.

              6. Alphabetical order for place names is often a poor choice compared with sorting on an interesting or important response being plotted.

              Comment


              • #8
                Originally posted by Scott Merryman View Post
                You need to set one off the values as a negative number. For example

                Code:
                sysuse census.dta,clear
                
                keep state2 marriage divorce
                encode state2, gen(id)
                keep if id <= 20
                gen divorce2 = -divorce
                
                sum marriage
                gen x = r(max)*1.2
                
                twoway bar marriage id, horiz barwidth(.5) ///
                || bar divorce2 id, horiz barwidth(.5) ///
                || scatter id x, mlabel(state2) msymbol(none) mlabsize(*1.1) mlabcolor(black) ///
                || ,legend(pos(3) order(1 "M" 2 "D")) yscale(off) ylabel(,nogrid) ///
                xlabel(-100000 "100000" 0 100000 200000) xscale( range(300000))
                [ATTACH=CONFIG]n1693769[/ATTACH]
                I really love this page! Thank you so much! put it negative, that is so clever! It worked! Thank you!!

                Comment


                • #9
                  Originally posted by Scott Merryman View Post
                  You need to set one off the values as a negative number. For example

                  Code:
                  sysuse census.dta,clear
                  
                  keep state2 marriage divorce
                  encode state2, gen(id)
                  keep if id <= 20
                  gen divorce2 = -divorce
                  
                  sum marriage
                  gen x = r(max)*1.2
                  
                  twoway bar marriage id, horiz barwidth(.5) ///
                  || bar divorce2 id, horiz barwidth(.5) ///
                  || scatter id x, mlabel(state2) msymbol(none) mlabsize(*1.1) mlabcolor(black) ///
                  || ,legend(pos(3) order(1 "M" 2 "D")) yscale(off) ylabel(,nogrid) ///
                  xlabel(-100000 "100000" 0 100000 200000) xscale( range(300000))
                  [ATTACH=CONFIG]n1693769[/ATTACH]
                  How would you sort the data here? let's say by "divorce"

                  Comment


                  • #10
                    You need to sort the variable before calling -twoway bar-

                    Code:
                    sysuse census.dta,clear
                    
                    keep state2 marriage divorce
                    gen divorce2 = -divorce
                    sort state2
                    keep in 1/20
                    gsort -divorce2
                    
                    gen id2 = _n
                    qui sum marriage
                    gen x = r(max)*1.2
                    
                    
                    twoway bar marriage id2, horiz barwidth(.5) ///
                    || bar divorce2 id2, horiz barwidth(.5) ///
                    || scatter id2 x, mlabel(state2) msymbol(none) mlabsize(*1.1) mlabcolor(black) ///
                    || ,legend(pos(3) order(1 "M" 2 "D")) yscale(off) ylabel(,nogrid) ///
                    xlabel(-100000 "100000" 0 100000 200000) xscale( range(300000))

                    Comment

                    Working...
                    X