Graphing the distribution of a wedge between two variables

Janina Gleed

Join Date: Feb 2022

Posts: 18
#1

Graphing the distribution of a wedge between two variables

15 Mar 2022, 12:27

Hello,

I have data on perceived and actual social norms and wish to have a variable that, for each respondent, records the deviation from the average across all respondents.

I would like to create a bar graph showing the distribution of people's beliefs about the share of UK students who say they try to fight global warming in relation to the actual share i.e. a distribution of the 'wedge' between perceived social norms and the actual share.

Variables:
NBpriorBehSelf: "Do you try to fight global warming?" (Yes/No)
NBpriorBehOthr: "Out of 100 people, how many students do you think try to fight global warming?" (1-100)

I have tried various things and believe I am closest with the following commands but I'm left with the questions
(a) how to create a variable for the 'actual %' (NBpriorBehSelf="yes" or 1)
(b) how to graph it as I haven't quite yet managed to get the graph similar to the given one?

egen actual=pc(NBpriorBehSelf) or
egen actual=mean(NBpriorBehSelf) if Groupid==1
generate deviation = NBpriorBehOthr - actual
tw hist deviation, discrete lcolor(gray) fcolor(gray%50) percent xtitle("Wedge (guess%= % - actual %)")

Included below is the (i) intended type of graph (iii) dataex and code

Code:

* Example generated by -dataex-. For more info, type help dataex clear * dataex NBpriorBehSelf NBpriorBehOthr input byte(NBpriorBehSelf NBpriorBehOthr) 1 80 1 83 1 80 1 40 1 60 1 89 1 65 1 70 1 78 1 40 1 50 1 90 1 46 1 90 2 61 1 70 1 90 2 40 1 97 1 60 1 88 1 71 1 80 1 47 1 40 1 92 2 87 1 90 1 90 1 83 1 98 1 100 1 80 1 60 1 60 1 75 1 80 2 91 1 80 1 90 1 85 1 80 1 71 1 70 1 61 1 97 1 57 2 61 1 95 1 90 1 80 1 85 1 81 1 80 1 90 2 1 1 95 1 75 2 66 1 87 1 80 1 60 1 70 1 91 1 44 1 69 1 90 1 76 2 20 1 70 2 10 1 45 1 70 1 70 1 47 1 85 1 81 1 61 1 59 1 60 2 60 1 100 1 80 1 60 1 73 1 95 1 80 1 76 1 83 2 16 2 78 1 75 1 68 1 79 1 94 1 90 2 17 2 20 1 60 end label values NBpriorBehSelf NBpriorBehSelf_lbl label def NBpriorBehSelf_lbl 1 "yes", modify label def NBpriorBehSelf_lbl 2 "no", modify

I really appreciate any advice while I learn and build my understanding of Stata's capabilities and if any further clarification or information would be useful, please let know.

Best wishes,
Janina
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35213
#2

15 Mar 2022, 13:15

You explain your data nicely and I think I understand them. I don't really follow the example graph and in any case think it goes too far from the actual data.

As I understand, there is an observed percent of students who say they fight and then we are looking at what students think it is the observed percent.

This plot is a little complicated, and could indeed be simplified, but the essence is to show the data directly, plus

observed percent as a reference line

summary boxes following box plot conventions (medians plus quartiles)

observed means as shorter reference lines.

I use scheme s1color.

I use stripplot from SSC. That can do various things, but the main idea used here goes back to Emanuel Parzen in 1979 (references in the help file). Oddly, or otherwise, I think he had a great idea to merge quantile and box plots, but it has had little impact partly because (a) his own data examples were not inspiring (b) he surrounded his message with elaborate mathematical details that were at best decorative and at worst obfuscating.

This design makes evident an elementary but fundamental notion behind the box plot:

If half the observations fall inside the box, then half fall outside it.

And often each half is every bit as interesting and important as the other.

There is secondary detail here too, such as the repetition of 60 70 80 90% as round-number guesses.

Code:

clear * dataex NBpriorBehSelf NBpriorBehOthr input byte(NBpriorBehSelf NBpriorBehOthr) 1 80 1 83 1 80 1 40 1 60 1 89 1 65 1 70 1 78 1 40 1 50 1 90 1 46 1 90 2 61 1 70 1 90 2 40 1 97 1 60 1 88 1 71 1 80 1 47 1 40 1 92 2 87 1 90 1 90 1 83 1 98 1 100 1 80 1 60 1 60 1 75 1 80 2 91 1 80 1 90 1 85 1 80 1 71 1 70 1 61 1 97 1 57 2 61 1 95 1 90 1 80 1 85 1 81 1 80 1 90 2 1 1 95 1 75 2 66 1 87 1 80 1 60 1 70 1 91 1 44 1 69 1 90 1 76 2 20 1 70 2 10 1 45 1 70 1 70 1 47 1 85 1 81 1 61 1 59 1 60 2 60 1 100 1 80 1 60 1 73 1 95 1 80 1 76 1 83 2 16 2 78 1 75 1 68 1 79 1 94 1 90 2 17 2 20 1 60 end label values NBpriorBehSelf NBpriorBehSelf_lbl label def NBpriorBehSelf_lbl 1 "yes", modify label def NBpriorBehSelf_lbl 2 "no", modify label var NBpriorBehSelf "Do you try to fight global warming?" label var NBpriorBehOthr "% of students you think try to fight global warming" egen pc_yes = mean(100 * (NBpriorBehSelf == 1)) su pc_yes, meanonly local show : display %3.1f r(mean) stripplot NBpriorBehOthr, over(NBpriorBehSelf) cumul cumprob vertical box centre refline(lc(orange)) yla(, ang(h) axis(2)) xla(, noticks) xli(1.5, lw(vthin) lc(gs8)) yli(`show', lc(blue)) yaxis(1 2) yla(`show' `" "actual" "`show'%" "', axis(1) ang(h)) ytitle("", axis(1)) ytitle("`: var label NBpriorBehOthr'", axis(2)) xsc(titlegap(*5))

Last edited by Nick Cox; 15 Mar 2022, 13:18.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35213
#3

16 Mar 2022, 04:51

On a second reading I am clearer on the graph shown in #1. code for which might be something like

Code:

egen pc_yes = mean(100 * (NBpriorBehSelf == 1) gen wedge = NBpriorBehOthr - pc_yes set scheme s1color histogram wedge, start(-100) width(10) percent
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35213
#4

16 Mar 2022, 07:56

Code:

egen pc_yes = mean(100 * (NBpriorBehSelf == 1))
Comment
Janina Gleed

Join Date: Feb 2022

Posts: 18
#5

16 Mar 2022, 12:51

Hi Nick, thank you very much for your feedback. I hadn’t considered a graph merging quantile and box plots as shown as on reflection I don’t think I have come across such a graph in papers before so thank you for highlighting the possibilities here. The way the design communicates a lot of relevant information is certainly interesting so I will play around with it some more.

As a clarification for the graph provided as an example, the red vertical line indicates the actual share of people who say they try to fight global warming. This serves as a reference point against which deviations in beliefs about this share can be represented.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35213
#6

16 Mar 2022, 13:13

You could certainly add a vertical line to a histogram such as generated by the code in #3 and #4.

There aren't many quantile-box plots that I know of but the stripplot help gives some references.
1 like
Comment

Announcement

Graphing the distribution of a wedge between two variables

Comment

Comment

Comment

Comment

Comment