Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graph bar with five dummy variables and four time points

    Dear All,

    I am working on STATA 18.0 with a dataset that includes five dummy variables: untargeted_lr untargeted_red untargeted_imm untargeted_eu untargeted_gendiss.

    I would like to present a parsimonious bar chart (graph bar) that shows, on the same graph (so same axis), the percentages of the two categories for each dummy variable, at each of four time points: 2006, 2010, 2014, 2019.

    At the moment my code for one variable and four time points is the following:

    Code:
    graph bar if voted==1, over(untargeted_lr, gap(0))  asyvars bar(2, fcolor(yellow)) by(year, title("Untargeted voters on Left-Right Dimension")) blabel(bar, format(%4.1f) color(black))
    Graphically, I get the following:

    Click image for larger version

Name:	GraphBar Ulr party voted.png
Views:	1
Size:	24.7 KB
ID:	1750408


    As said, first I am trying to place these four bar charts close to each other on the same axis. Second, I would like to add the other four dummies on the same axis.
    I would like to get something like this:

    Click image for larger version

Name:	Schermata 2024-04-18 alle 09.22.01.png
Views:	1
Size:	54.4 KB
ID:	1750409

    Where on the left I would have the y axis with percentages, the sub-titles under the graphs should indicate each of the 5 dummy variables, and each pair of bars the two categories of each dummy variable in one point in time.

    Is it feasible?

    Thanks a lot
    Mattia

  • #2
    This really depends on details of your data that are hard to explain. The best way is to give us an example dataset to work with as is discussed in the Statalist FAQ (link in the black bar near the top of this page).

    Regardless of that I can make one comment: if you have a set of binary/indicator/dummy variables and you value parsimony, then you don't want to present both proportions: they add up to one (or a hundred if you work with percentages), so if you know one you also know the other.
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      I was going to make the same point as Maarten Buis and indeed I would go further.

      Your second graph in #1 isn't to me consistent with your word description and it's hard to link the legend and the graph.

      But running with the idea of 5 indicator (*) variables and 4 time points still implies 20 bars and you need to be able to compare easily

      1. means for each variable across the years

      2. means for each variable with the other variables

      which suggests to me that a line graph would be a strong competitor -- or perhaps a dot chart in the sense of graph dot. We can't tell without a data example (+).

      Here is a token line graph with invented data.

      Code:
      clear 
      set obs 400
      set seed 2803
      gen year = 2016 + ceil(_n/100)
      
      forval j = 1/5 {
          gen y`j' = runiform() > `j'/7  
          egen y`j'_mean_pc = mean(100 * y`j'), by(year)
          label var y`j'_mean_pc "y`j'"
      }
      
      line y*pc year , ytitle(Percent) xtitle("")
      Click image for larger version

Name:	manylines.png
Views:	1
Size:	48.2 KB
ID:	1750423


      There is plenty of scope for improvement:

      1. twoway connected would work as well or better.

      2. Some colours might be replaced.

      3. Direct labelling (explanatory text next to each line) would be an improvement on a legend -- so long as there is space to do that.

      4. Naturally more informative variable labels should be used.

      5. Stata would start the scale at 0 if any percents seen were small but that would often be a choice any way (but not if (e.g.) all the percents were high).



      (*) Section 2 of https://journals.sagepub.com/doi/pdf...36867X19830921 urges use of the term indicator variables over dummy variables.

      (+) A good data example for us need only give the means of each variable by year:


      Code:
      preserve 
      
      collapse y1 y2 y3 y4 y5 , by(year) 
      
      dataex 
      
      restore

      Comment


      • #4
        Hi Marteen and Nick,

        you are both correct, thanks for the insights. First, yes it would be more parsimonious by showing just one proportion out of the two. Second, here is the dataex.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input double year float(untargeted_lr untargeted_red untargeted_imm untargeted_eu untargeted_gendiss)
        2006 .010203773  .02608325  .02217769 .04898397  .02534294
        2010 .009533115 .034049455  .02064595         . .034953464
        2014 .011656462  .03425341 .017544555 .04806729   .0291466
        2019 .011453296   .0266131 .021872064 .03840274  .02858328
        end
        Third, I agree that the line graph could be a very good alternative.

        Sincerely
        Mattia

        Comment


        • #5
          Thanks for the data example. These numbers are all small. I am assuming that they are proportions, i.e. means over {0, 1} variables.

          I didn't try hard with the bar charts. The reason for trying with hbar can be seen quickly by trying bar as well.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input double year float(untargeted_lr untargeted_red untargeted_imm untargeted_eu untargeted_gendiss)
          2006 .010203773  .02608325  .02217769 .04898397  .02534294
          2010 .009533115 .034049455  .02064595         . .034953464
          2014 .011656462  .03425341 .017544555 .04806729   .0291466
          2019 .011453296   .0266131 .021872064 .03840274  .02858328
          end
          
          foreach v of var untargeted* { 
              local label = substr("`v'", 12, .)
              label var `v' "`label'"
          }
          
          twoway connected untarget* year,  ytitle(Percent) lc(stc1 stc2 stc3 magenta black) ///
          mc(stc1 stc2 stc3 magenta black) yla(0 0.01 "1" 0.02 "2" 0.03 "3" 0.04 "4" 0.05 "5") ///
          legend(order(4 5 2 3 1)) name(G1, replace) xla(2006 2010 2014 2019) xtitle("")
          
          reshape long untargeted_, i(year) j(which) string 
          
          replace untargeted_ = 100 * untargeted_ 
          
          graph hbar untargeted_, over(year) over(which) name(G2, replace) ytitle("Percent")
          
          graph hbar untargeted_, over(which) over(year) name(G3, replace) ytitle("Percent")
          Click image for larger version

Name:	manylines2.png
Views:	1
Size:	52.0 KB
ID:	1750442


          I don't know what most of the suffixes mean.

          Comment


          • #6
            Dear Nick,

            I actually get

            Code:
            reshape long untargeted_ if voted==1, i(year) j(which) string 
            (j = eu gendiss imm lr red)
            variable id does not uniquely identify the observations
                Your data are currently wide. You are performing a reshape long. You specified i(year) and
                j(which). In the current wide form, variable year should uniquely identify the observations.
                Remember this picture:
            It appears strange to me as, double-checking I do not have multiple obs per id.

            Comment


            • #7
              #6 shows that I started with the reduced dataset you helpfully provided in #5.

              I can't see your full dataset and can only guess at why your reshape command didn't work. It could be that there are many missing values in the dataset as a side-effect of import from a spreadsheet file or that you have a different layout from what might be guessed.

              Either way, to reproduce (and vary) my code, start as I did.

              Comment


              • #8
                Thanks a lot Nick, I managed to make it work.

                Sincerely
                Mattia

                Comment

                Working...
                X