Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Graphing the distribution of a wedge between two variables

    Hello,

    I have data on perceived and actual social norms and wish to have a variable that, for each respondent, records the deviation from the average across all respondents.

    I would like to create a bar graph showing the distribution of people's beliefs about the share of UK students who say they try to fight global warming in relation to the actual share i.e. a distribution of the 'wedge' between perceived social norms and the actual share.

    Variables:
    NBpriorBehSelf: "Do you try to fight global warming?" (Yes/No)
    NBpriorBehOthr: "Out of 100 people, how many students do you think try to fight global warming?" (1-100)


    I have tried various things and believe I am closest with the following commands but I'm left with the questions
    (a) how to create a variable for the 'actual %' (NBpriorBehSelf="yes" or 1)
    (b) how to graph it as I haven't quite yet managed to get the graph similar to the given one?

    egen actual=pc(NBpriorBehSelf) or
    egen actual=mean(NBpriorBehSelf) if Groupid==1
    generate deviation = NBpriorBehOthr - actual
    tw hist deviation, discrete lcolor(gray) fcolor(gray%50) percent xtitle("Wedge (guess%= % - actual %)")


    Included below is the (i) intended type of graph (iii) dataex and code

    Click image for larger version

Name:	Figure A.3.png
Views:	1
Size:	17.9 KB
ID:	1654506


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    * dataex NBpriorBehSelf NBpriorBehOthr
    
    input byte(NBpriorBehSelf NBpriorBehOthr)
    1  80
    1  83
    1  80
    1  40
    1  60
    1  89
    1  65
    1  70
    1  78
    1  40
    1  50
    1  90
    1  46
    1  90
    2  61
    1  70
    1  90
    2  40
    1  97
    1  60
    1  88
    1  71
    1  80
    1  47
    1  40
    1  92
    2  87
    1  90
    1  90
    1  83
    1  98
    1 100
    1  80
    1  60
    1  60
    1  75
    1  80
    2  91
    1  80
    1  90
    1  85
    1  80
    1  71
    1  70
    1  61
    1  97
    1  57
    2  61
    1  95
    1  90
    1  80
    1  85
    1  81
    1  80
    1  90
    2   1
    1  95
    1  75
    2  66
    1  87
    1  80
    1  60
    1  70
    1  91
    1  44
    1  69
    1  90
    1  76
    2  20
    1  70
    2  10
    1  45
    1  70
    1  70
    1  47
    1  85
    1  81
    1  61
    1  59
    1  60
    2  60
    1 100
    1  80
    1  60
    1  73
    1  95
    1  80
    1  76
    1  83
    2  16
    2  78
    1  75
    1  68
    1  79
    1  94
    1  90
    2  17
    2  20
    1  60
    end
    label values NBpriorBehSelf NBpriorBehSelf_lbl
    label def NBpriorBehSelf_lbl 1 "yes", modify
    label def NBpriorBehSelf_lbl 2 "no", modify
    I really appreciate any advice while I learn and build my understanding of Stata's capabilities and if any further clarification or information would be useful, please let know.

    Best wishes,
    Janina

  • #2
    You explain your data nicely and I think I understand them. I don't really follow the example graph and in any case think it goes too far from the actual data.

    As I understand, there is an observed percent of students who say they fight and then we are looking at what students think it is the observed percent.

    This plot is a little complicated, and could indeed be simplified, but the essence is to show the data directly, plus

    observed percent as a reference line

    summary boxes following box plot conventions (medians plus quartiles)

    observed means as shorter reference lines.

    I use scheme s1color.

    I use stripplot from SSC. That can do various things, but the main idea used here goes back to Emanuel Parzen in 1979 (references in the help file). Oddly, or otherwise, I think he had a great idea to merge quantile and box plots, but it has had little impact partly because (a) his own data examples were not inspiring (b) he surrounded his message with elaborate mathematical details that were at best decorative and at worst obfuscating.

    This design makes evident an elementary but fundamental notion behind the box plot:

    If half the observations fall inside the box, then half fall outside it.

    And often each half is every bit as interesting and important as the other.

    There is secondary detail here too, such as the repetition of 60 70 80 90% as round-number guesses.


    Code:
    clear
    * dataex NBpriorBehSelf NBpriorBehOthr
    
    input byte(NBpriorBehSelf NBpriorBehOthr)
    1  80
    1  83
    1  80
    1  40
    1  60
    1  89
    1  65
    1  70
    1  78
    1  40
    1  50
    1  90
    1  46
    1  90
    2  61
    1  70
    1  90
    2  40
    1  97
    1  60
    1  88
    1  71
    1  80
    1  47
    1  40
    1  92
    2  87
    1  90
    1  90
    1  83
    1  98
    1 100
    1  80
    1  60
    1  60
    1  75
    1  80
    2  91
    1  80
    1  90
    1  85
    1  80
    1  71
    1  70
    1  61
    1  97
    1  57
    2  61
    1  95
    1  90
    1  80
    1  85
    1  81
    1  80
    1  90
    2   1
    1  95
    1  75
    2  66
    1  87
    1  80
    1  60
    1  70
    1  91
    1  44
    1  69
    1  90
    1  76
    2  20
    1  70
    2  10
    1  45
    1  70
    1  70
    1  47
    1  85
    1  81
    1  61
    1  59
    1  60
    2  60
    1 100
    1  80
    1  60
    1  73
    1  95
    1  80
    1  76
    1  83
    2  16
    2  78
    1  75
    1  68
    1  79
    1  94
    1  90
    2  17
    2  20
    1  60
    end
    label values NBpriorBehSelf NBpriorBehSelf_lbl
    label def NBpriorBehSelf_lbl 1 "yes", modify
    label def NBpriorBehSelf_lbl 2 "no", modify
    
    label var NBpriorBehSelf "Do you try to fight global warming?"
    label var NBpriorBehOthr "% of students you think try to fight global warming"
    
    egen pc_yes = mean(100 * (NBpriorBehSelf == 1))
    su pc_yes, meanonly
    local show : display %3.1f r(mean)
    
    stripplot NBpriorBehOthr, over(NBpriorBehSelf) cumul cumprob vertical box centre refline(lc(orange)) yla(, ang(h) axis(2)) xla(, noticks) xli(1.5, lw(vthin) lc(gs8)) yli(`show', lc(blue)) yaxis(1 2) yla(`show' `" "actual" "`show'%" "', axis(1) ang(h)) ytitle("", axis(1))  ytitle("`: var label NBpriorBehOthr'", axis(2)) xsc(titlegap(*5))
    Click image for larger version

Name:	fightwarming.png
Views:	1
Size:	30.1 KB
ID:	1654519

    Last edited by Nick Cox; 15 Mar 2022, 13:18.

    Comment


    • #3
      On a second reading I am clearer on the graph shown in #1. code for which might be something like

      Code:
      egen pc_yes = mean(100 * (NBpriorBehSelf == 1)
      
      gen wedge = NBpriorBehOthr - pc_yes 
      
      set scheme s1color 
      histogram wedge, start(-100) width(10) percent

      Comment


      • #4
        Code:
         
         egen pc_yes = mean(100 * (NBpriorBehSelf == 1))

        Comment


        • #5
          Hi Nick, thank you very much for your feedback. I hadn’t considered a graph merging quantile and box plots as shown as on reflection I don’t think I have come across such a graph in papers before so thank you for highlighting the possibilities here. The way the design communicates a lot of relevant information is certainly interesting so I will play around with it some more.

          As a clarification for the graph provided as an example, the red vertical line indicates the actual share of people who say they try to fight global warming. This serves as a reference point against which deviations in beliefs about this share can be represented.

          Comment


          • #6
            You could certainly add a vertical line to a histogram such as generated by the code in #3 and #4.

            There aren't many quantile-box plots that I know of but the stripplot help gives some references.

            Comment

            Working...
            X