
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Making a bar graph for a subsection of data; including 'missing' data and average value


    I would like to create a bar graph showing the distribution of a certain value (the amount of ££s survey respondents are willing to donate to a given charity) together with a thin vertical line showing the average amount donated as a reference point.

    The 'problems' I am facing are:
    1) How to show all categories regardless of whether 0 or not? (The bar graph currently only shows bars for values that are given by respondents. E.g. no one gave £2 so is omitted. I presume it to be good practice to include these so as to visually read the graph better at sight.)
    2) How to graph these values only for a subset of the data? (I have collected data via a survey. There are 2 treatments (given as '1' and '2' by GroupID below. I would like to show the distribution of donations only for Group 1. However, so far I haven't been able to separate them.)
    3) How to add a thin vertical line to show the average value as a reference point? (Is there a command? I have only otherwise found how to manually draw a line).

    Below included (I) current graph (ii) intended graph (iii) dataex and code

    Click image for larger version

Name:	Graph.png
Views:	1
Size:	31.0 KB
ID:	1652563

    Click image for larger version

Name:	Graph intended-min.png
Views:	1
Size:	9.9 KB
ID:	1652562

    graph bar, over(Donation, label(labsize(vsmall))) graphregion(color(white)) ytitle(`percent', size(small)) title('The distribution of individual willingness to fight global warming', size(small)) b1title("Donation (£)")
    * Example generated by -dataex-. For more info, type help dataex
    * dataex ResponseId Donation meanprice_for_group Groupid
    input str17 ResponseId int Donation float meanprice_for_group long Groupid
    "R_2bTC6yfJ9D3ZXql"  20 64.46154 1
    "R_Okvpu1Ga2S8cVUt"  52 64.46154 1
    "R_xzMwnLib43TGJPz" 110 64.46154 1
    "R_1gbT3pBht6XY1B6" 110 64.46154 1
    "R_1pmKGHiY7Zm6MFc" 100 64.46154 1
    "R_1igs0Ghw9Mlfdov"  50 64.46154 1
    "R_1DDbg6O6SDOCnrw"  80 64.46154 1
    "R_2rqmaKA6PEkEgfx"  80 64.46154 1
    "R_3HN682nKJ5Y1ycc"  20 64.46154 1
    "R_3qmbAsxwMlpEuiD"  60 64.46154 1
    "R_1OGKiQFyrd3TXmB"  84 64.46154 1
    "R_1CHgL5Cu0cJCjWX"  90 64.46154 1
    "R_43gV9fXPvO8iuVX"  95 64.46154 1
    "R_2D5K9tiKWYJXRXF"  30 64.46154 1
    "R_2EiApI6yUDBLlU8"  59 64.46154 1
    "R_rq2e4vek7UmF3r3"   1 64.46154 1
    "R_cMZrpeA00gjVIm5"  20 64.46154 1
    "R_e5SsOmIywfRwzTz"   1 64.46154 1
    "R_2CvbbvbXEzsJmG5"  90 64.46154 1
    "R_3nkKMH9SmNvNkld"  75 64.46154 1
    "R_XgFprN8sZrei7O9"  70 64.46154 1
    "R_1hLx1nTSNHDMAFe"  50 64.46154 1
    "R_23eofu3GpLXUeME" 110 64.46154 1
    "R_2sSenyiBiUJ0yCL"  30 64.46154 1
    "R_1PRwRAVy4xbeYpO" 110 64.46154 1
    "R_pQo42C2xKLt0r0l" 110 64.46154 1
    "R_3itk79Q8s3t8ycr" 110 64.46154 1
    "R_3OjlGGpuAqY11Oe"  71 64.46154 1
    "R_2R99vPGcdzNcZeG"  55 64.46154 1
    "R_3qVFZSidyrpekEJ" 110 64.46154 1
    "R_3qVBDmamjMnLpDc"  50 64.46154 1
    "R_1JDdR96i8TgYYJO"  64 64.46154 1
    "R_Wrsjwj8SOK6s48N"  95 64.46154 1
    "R_YbQgP2HxqDWAbM5"  10 64.46154 1
    "R_1YqSJ647fatsbCx"  39 64.46154 1
    "R_1LG2NFd7lLHLQgB"  84 64.46154 1
    "R_3Ld9vEyqBRqWM5b" 110 64.46154 1
    "R_uq8nRMxvVQ7ScOB"  90 64.46154 1
    "R_1FgK3GkHVgzrucq" 110 64.46154 1
    "R_RlycIrfuYWrEvbr"  82 64.46154 1
    "R_1lhRbnasoNVg3Yu" 110 65.63158 2
    "R_1LIqtfuaHirqNji"   0 65.63158 2
    "R_2B4muV5wfLQCQyP"  63 65.63158 2
    "R_1K8yADjlYukIKN9"   0 65.63158 2
    "R_6glzKqroF1tqA7f" 110 65.63158 2
    "R_SCy0Vvt3D6VnNT3"  10 65.63158 2
    "R_2SDlsAnTe4KjIMU"  84 65.63158 2
    "R_2pYyqdKDMIC2EEE"  90 65.63158 2
    "R_2SDnolevyhqSeg9"  20 65.63158 2
    "R_O8dN5Cj0zoj3NjX" 110 65.63158 2
    "R_1cUgIo8U2W2d5MA" 110 65.63158 2
    "R_yC2wrVmNkQCpWkp" 110 65.63158 2
    label values Groupid Groupid
    label def Groupid 1 "Control", modify
    label def Groupid 2 "Empirical", modify
    Any help or guidance is greatly appreciated while I learn to find my way around Stata! Please let me know if any useful information is missing.

    Best regards

  • #2
    See the -if- qualifier to select groups

    help if
    Thanks for the data example. You can try twoway hist with the -discrete- option.

    * Example generated by -dataex-. For more info, type help dataex
    input str17 ResponseId int Donation float meanprice_for_group long Groupid
    "R_2bTC6yfJ9D3ZXql"  20 64.46154 1
    "R_Okvpu1Ga2S8cVUt"  52 64.46154 1
    "R_xzMwnLib43TGJPz" 110 64.46154 1
    "R_1gbT3pBht6XY1B6" 110 64.46154 1
    "R_1pmKGHiY7Zm6MFc" 100 64.46154 1
    "R_1igs0Ghw9Mlfdov"  50 64.46154 1
    "R_1DDbg6O6SDOCnrw"  80 64.46154 1
    "R_2rqmaKA6PEkEgfx"  80 64.46154 1
    "R_3HN682nKJ5Y1ycc"  20 64.46154 1
    "R_3qmbAsxwMlpEuiD"  60 64.46154 1
    "R_1OGKiQFyrd3TXmB"  84 64.46154 1
    "R_1CHgL5Cu0cJCjWX"  90 64.46154 1
    "R_43gV9fXPvO8iuVX"  95 64.46154 1
    "R_2D5K9tiKWYJXRXF"  30 64.46154 1
    "R_2EiApI6yUDBLlU8"  59 64.46154 1
    "R_rq2e4vek7UmF3r3"   1 64.46154 1
    "R_cMZrpeA00gjVIm5"  20 64.46154 1
    "R_e5SsOmIywfRwzTz"   1 64.46154 1
    "R_2CvbbvbXEzsJmG5"  90 64.46154 1
    "R_3nkKMH9SmNvNkld"  75 64.46154 1
    "R_XgFprN8sZrei7O9"  70 64.46154 1
    "R_1hLx1nTSNHDMAFe"  50 64.46154 1
    "R_23eofu3GpLXUeME" 110 64.46154 1
    "R_2sSenyiBiUJ0yCL"  30 64.46154 1
    "R_1PRwRAVy4xbeYpO" 110 64.46154 1
    "R_pQo42C2xKLt0r0l" 110 64.46154 1
    "R_3itk79Q8s3t8ycr" 110 64.46154 1
    "R_3OjlGGpuAqY11Oe"  71 64.46154 1
    "R_2R99vPGcdzNcZeG"  55 64.46154 1
    "R_3qVFZSidyrpekEJ" 110 64.46154 1
    "R_3qVBDmamjMnLpDc"  50 64.46154 1
    "R_1JDdR96i8TgYYJO"  64 64.46154 1
    "R_Wrsjwj8SOK6s48N"  95 64.46154 1
    "R_YbQgP2HxqDWAbM5"  10 64.46154 1
    "R_1YqSJ647fatsbCx"  39 64.46154 1
    "R_1LG2NFd7lLHLQgB"  84 64.46154 1
    "R_3Ld9vEyqBRqWM5b" 110 64.46154 1
    "R_uq8nRMxvVQ7ScOB"  90 64.46154 1
    "R_1FgK3GkHVgzrucq" 110 64.46154 1
    "R_RlycIrfuYWrEvbr"  82 64.46154 1
    "R_1lhRbnasoNVg3Yu" 110 65.63158 2
    "R_1LIqtfuaHirqNji"   0 65.63158 2
    "R_2B4muV5wfLQCQyP"  63 65.63158 2
    "R_1K8yADjlYukIKN9"   0 65.63158 2
    "R_6glzKqroF1tqA7f" 110 65.63158 2
    "R_SCy0Vvt3D6VnNT3"  10 65.63158 2
    "R_2SDlsAnTe4KjIMU"  84 65.63158 2
    "R_2pYyqdKDMIC2EEE"  90 65.63158 2
    "R_2SDnolevyhqSeg9"  20 65.63158 2
    "R_O8dN5Cj0zoj3NjX" 110 65.63158 2
    "R_1cUgIo8U2W2d5MA" 110 65.63158 2
    "R_yC2wrVmNkQCpWkp" 110 65.63158 2
    label values Groupid Groupid
    label def Groupid 1 "Control", modify
    label def Groupid 2 "Empirical", modify
    sum Donation if Groupid==1
    local mean= r(mean)
    set scheme s1color
    tw hist Donation if Groupid==1, discrete xlab(0(5)110) lcolor(gray) fcolor(gray%50) percent xtitle("Donation (£)") xline(`mean')
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	30.9 KB
ID:	1652597


    • #3
      I join Andrew Musau in his implication that graph bar is not a good idea here, whereas a histogram may indeed be helpful.

      Much depends on whether #1 gives all the data, or you're talking about say 500 or 5000 people.

      Other ideas to throw into the discussion are quantile plots and even stem-and-leaf plots. The large number of values of GBP 110 is striking.


      • #4
        Hi Andrew and Nick,

        Thank you very much for your help. I had been trying all sorts for many many hours until this point and couldn't figure it out so I really appreciate your advice.

        #1 is a small section of the data. I will look further into the commands used to better my understanding and look into your suggestions.

        Best wishes,


        • #5
          Here is a token quantile plot using qplot from the Stata Journal. The distributions could be superimposed if desired.

           qplot Donation, by(Groupid, note("")) scheme(s1color) yla(0(10)110, grid) xla(0 0.25 0.5 0.75 1)
          Click image for larger version

Name:	qplot_gleed.png
Views:	1
Size:	22.9 KB
ID:	1652679


          • #6
            I hadn’t considered a quantile plot – I will look into these more as they seem to provide a very useful visualisation of the data. Thank you very much Nick!

