Likert Scale graph bar

Mengo Elena

Join Date: Oct 2018

Posts: 9
#1

Likert Scale graph bar

22 Aug 2023, 07:01

Hello all,

I am trying to create a likert scale variable (the likert scale has 5 categories: Always, Often, Sometimes, Rarely and Never). This is what I have managed to come up with so far (attached).I have used the following code:

- graph hbar (asis) Always Often Sometimes Rarely Never, over(Items,label(labsize(vsmall))) over(District,label(labsize(small))) stack ytitle("percentage", size(small))ysize(5) xsize(5) legend(size(vsmall)) title("Litter Items Frequency")

The bars are really too small so I was wondering if anyone can suggest how to increase the bars width and distance them a bit so that I can increase the font of the items listed on the x axis from (vsmall to small ideally)?
Thank you in advance for all your inputs

Elena
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35440
#2

22 Aug 2023, 07:09

You want to make bars bigger and further apart and the text bigger? Only by making the eventual image bigger, I suspect.
Comment
Tiago Pereira

Join Date: Jan 2016

Posts: 375
#3

22 Aug 2023, 07:44

Mengo, hi.

An alternative is to customize all labels with smaller fonts. Also, you can add the aspectratio(#) and scale(#) options to your commands (where # denotes a number, say, 0.75 or 0.5).
Comment
Mengo Elena

Join Date: Oct 2018

Posts: 9
#4

22 Aug 2023, 07:55

Originally posted by Nick Cox View Post

You want to make bars bigger and further apart and the text bigger? Only by making the eventual image bigger, I suspect.

I need to add the graph in a scientific manuscript and by making the image bigger it would be too large compared to the standard figure size I am afraid, but thanks for the suggestion Nick!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10089
#5

22 Aug 2023, 08:53

For one, vertical district labels will free up some space without much affecting readability, but you need to present a data example for other suggestions (see FAQ Advice #12 for details).

Code:

over(District, label(angle(vert) labgap(0)))
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35440
#6

22 Aug 2023, 09:30

I agree with Andrew Musau in asking for a data example. Please show the results of

Code:

dataex Items District Always Often Sometimes Rarely Never

which should be just 27 observations.

My main concern is that you're tinkering with the design when the graph in #1 is more problematic than that. In principle, you are showing all the information. In practice, how easy is it for anyone read off even broad contrasts between items and/or districts?
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35440
#7

23 Aug 2023, 15:18

I am still hopeful of seeing a data example here. Meanwhile I want to push an approach I have often found helpful.

Plot grades against cumulative probabilities, except that you

* should use midpoints of cumulative probabilities for each interval

* might be well advised to use a transformed scale such as logit for the latter.

The rationale for midpoints is best shown by example. Suppose 5 grades have probabilities

0.1 0.2 0.4 0.2 0.1

so that the cumulative probabilties are

0.1 0.3 0.7 0.9 1 by a <= definition or

0 0.1 0.3 0.7 0.9 by a < definition.

Neither definition treats the categories symmetrically; in each there is one point -- 0 or 1 -- that is true by definition and not informative about the data; and in each there is one point -- again 0 or 1 -- that won't suit many possible transformations, such as logit.

An easy compromise is to split the difference

0.05 0.2 0.5 0.8 0.95

-- an idea used many times, under many different names, of which ridit is perhaps the most common. (Irwin Bross, who introduced the term, gave a poker-faced rationalization at the time, often repeated verbatim in texts and papers, but much later revealed that the name honoured his wife Rida. Ridiculous, but true.)

The rationale for a transformed scale is that often they straighten curves.

I pushed these little ideas in 2004 in https://journals.sagepub.com/doi/pdf...867X0400400209 and earlier at the London 2001 meeting https://www.stata.com/meeting/7uk/cox1.pdf

This is a simple example I've used often. The calculations and graphs are simple but as a matter of convenience I use qplot and mylabels from the Stata Journal.

[CODE]
sysuse auto, clear

mylabels 0.05 0.1(0.2)0.9 0.95, myscale(logit(@)) local(xla)

qplot rep78, over(foreign) midpoint recast(connected) aspect(1) trscale(logit(@)) xla(`xla') xtitle(Cumulative probability (logit scale)) legend(order(2 1))
/CODE]

So rep78 is an ordered (ordinal, graded) response (a Likert item, if you like it that way -- but strictly, not a Likert scale at all).

foreign is a predictor, which happens to be binary, but anything categorical can be used in the same role.

If there is no relationship between the predictor and the response, we expect curves that are essentially similar. If there is a relationship then we expect different curves. The curves might differ in level, slope, curvature and so forth -- but in this example there is a simple shift.

Jargon that may be familiar is that one distribution stochastically dominates (is stochastically larger or higher than) another if the curves never cross. (I've often seen the idea explained with cumulative distribution functions, with the axes reversed from this plot, which seems more awkward than this representation. Higher means here what it says!)

That is, as drawn here a quantile plot has values on the vertical axis and cumulative probability or plotting position on the horizontal axis.

A(n) (empirical) (cumulative) distribution function plot -- in some fields ECDF plot is an established term -- has the axes reversed.

Terminology is not quite standardized, which is occasionally puzzling, but not intensely problematic.

So, what are the implications for the data in this thread -- for relative frequencies of litter items in three districts of Odisha in India? (I am a geographer, and more crucially just Googled a little like anybody else.)

The 5 response variables need to be pushed through reshape long, a routine change of dataset layout.

Then there are 3 x 9 possible curves, too many for a single graph, but a good solution can be glimpsed.

My impression is that variations between different litter items are greater than variations between the areas, which are more subtle.

The order of items at present is just alphabetical, which is just what Stata will do by default. A more informative order would be using some kind of mean or median: almost any summary of level will work better than alphabetical order. See https://journals.sagepub.com/doi/pdf/10.1177/1536867X211045582 for a detailed discussion of re-ordering categorical variables.

So, what I would want to try -- given access to the data -- is to have 9 panels, one for each kind of litter, each showing curves for 3 areas.

Last edited by Nick Cox; 23 Aug 2023, 16:10.
1 like
Comment

Mengo Elena

Join Date: Oct 2018
Posts: 9

25 Aug 2023, 02:27

Nick Cox Apologies for the slow reply. I have been thinking how do I want to present the data. In a nutshell I have a dataset with answers to a question regarding frequency (5 point Likert Scale - 1 always to 5 never) of seeing marine plastic items. I have a graph presenting means scores for the entire sample. Further tested Kruskal-Wallis test and Wilcoxon Man test to test for differences in responses at District level (3 districts as shown in my first post) and hours spent at sea (2 groups) and I would like to present these results graphically. I have now attached below the data. For each marine litter item I have reported the mean score in the first row and the standard deviation in the second row. Would the approach you showed above be still recommended? Would it be better to present the frequencies in percentages rather than mean scores? Thank you in advance for any suggestion! Elena

Marine Litter Items	1-8 hours	13-16 hours	Marine Litter Items	Puri	Khurda	Ganjam
Plastic bags	1.9	1.9	Plastic bags	1.3	2.1	2.3
1.3	1.1	0.8	1.3	1.3
Plastic bottles	1.5	1.8	Plastic bottles	1.1	1.7	1.8
0.9	1.0	0.6	1.1	1.0
Food wrappers	1.9	1.7	Food wrappers	1.4	1.9	2.3
1.2	1.0	0.9	1.2	1.2
Fishing nets	4.1	4.4	Fishing nets	4.2	4.2	4.1
0.8	1.0	0.9	0.8	0.7
Fishing hooks/lines	4.5	4.8	Fishing hooks/lines	4.6	4.6	4.7
0.7	0.4	0.6	0.8	0.6
Fishing traps	4.4	4.7	Fishing traps	4.3	4.5	4.6
0.8	0.5	0.8	0.7	0.5
Synthetic ropes	3.1	2.5	Synthetic ropes	3.1	2.7	3.0
1.2	0.9	1.1	1.1	1.2
Metal cans	4	4.3	Metal cans	4.2	4.0	3.9
1.3	1.3	1.2	1.4	1.5
Glass bottles	1.1	1.8	Glass bottles	1.2	1.3	1.3
0.6	1.1	0.7	0.8	0.8

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35440
#9

25 Aug 2023, 03:50

There is some mismatch of analysis choices here. If you think that mean and standard deviations of scores make sense then applying Kruskal-Wallis and Wilcoxon-Mann-Whitney (not Wilcoxon Man) is too pessimistic about the data. Other way round, if you think those are the only defensible tests you may have a fight on your hands with examiners or reviewers (depending on who will be assessing your work) who may well draw the line at means and SDs for ordinal (ordered) scores or grades.

What I asked for in #6 is still what I am most interested in myself. Please note the request to use dataex as explained in FAQ Advice #12.

Sorry, but I can't make sense of what you're showing in #8. Again, please use dataex to show data examples.
Comment

Mengo Elena

Join Date: Oct 2018
Posts: 9

#10

25 Aug 2023, 05:50

Nick Cox please see below, an example of how raw data look like. 1= always 2= Never 5=Sometimes 4= Rarely and 3= often

Code:

* Example generated by -dataex-. To install: ssc install dataex
 dataex District Plastic_bags Plastic_bottles Food_wrappers Fishing_nets Fishinghooks_lines Fishing_traps Synthetic_ropes Metal_cans Glass_bottles

clear
input str6 District byte(Plastic_bags Plastic_bottles Food_wrappers Fishing_nets Fishinghooks_lines Fishing_traps Synthetic_ropes Metal_cans Glass_bottles)
"Puri"   1 1 1 2 2 2 4 4 5
"Puri"   1 1 1 2 2 2 2 1 1
"Puri"   1 1 1 2 2 2 1 1 1
"Puri"   1 1 1 2 2 2 5 5 1
"Puri"   2 4 4 1 2 2 2 2 4
"Puri"   3 1 3 4 2 2 4 2 1
"Puri"   3 1 1 4 4 2 4 2 1
"Puri"   1 1 1 4 4 4 3 4 1
"Puri"   1 1 5 4 4 2 3 2 1
"Puri"   1 1 5 2 5 5 2 2 5
"Khurda" 1 1 1 4 2 2 4 2 1
"Khurda" 1 1 1 2 2 2 4 4 1
"Khurda" 4 2 4 5 2 2 4 2 1
"Khurda" 4 5 5 5 4 4 2 5 1
"Khurda" 1 1 1 5 4 4 3 4 1
"Khurda" 1 1 1 4 2 4 4 4 1
"Khurda" 1 1 1 4 4 4 3 2 1
"Khurda" 1 1 1 4 2 4 4 1 1
"Khurda" 1 1 1 4 4 4 2 4 1
"Khurda" 1 1 1 2 4 2 2 4 1
"Ganjam" 4 1 1 1 2 4 4 4 1
"Ganjam" 4 1 1 4 2 4 4 1 1
"Ganjam" 4 1 1 4 2 2 2 4 1
"Ganjam" 2 1 1 4 2 2 4 1 1
"Ganjam" 4 1 1 4 2 2 2 2 1
"Ganjam" 1 1 1 4 2 2 1 1 1
"Ganjam" 1 1 1 4 2 2 2 2 1
"Ganjam" 1 1 1 4 2 2 1 4 1
"Ganjam" 4 1 1 4 2 2 2 2 1
"Ganjam" 4 1 1 4 2 4 2 4 1
"Ganjam" 1 1 1 4 2 2 2 1 1
end

Hope this is more useful
Elena

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35440
#11

25 Aug 2023, 05:59

That's not what you used to create the graph in #1, which would be more helpful. It's your project with your choices, and I am just a volunteer, but I don't have to give more time to this if the challenge keeps changing. Sorry if that is disappointing.
1 like
Comment
Mengo Elena

Join Date: Oct 2018

Posts: 9
#12

25 Aug 2023, 07:01

No disappointment at all, in the first graph I showed the frequencies distribution of percentages for the all sample My understanding from the datase FAQ is that "a copy of 20 or so observations from your dataset is enough to show your problem" so this is what I have done in my previous post using the raw data in my dataset. My problem is how to show via a graph the frequencies or score means of litter items by respondents' district. Thank you anyway for your help Nick Cox and apologies for the confusion
Elena
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35440

#13

25 Aug 2023, 07:55

Indeed; that's general advice, but my personal focus here -- for my gratification as well possibly your interest -- is to have access to 27 observations (3 districts, 9 items) on 7 variables as used for #1.

The means could be shown using graph dot or more flexibly in some ways using scatter

Here is some technique. myaxis is from the Stata Journal.

Code:

clear 

input str6 District byte(Plastic_bags Plastic_bottles Food_wrappers Fishing_nets Fishinghooks_lines Fishing_traps Synthetic_ropes Metal_cans Glass_bottles)
"Puri"   1 1 1 2 2 2 4 4 5
"Puri"   1 1 1 2 2 2 2 1 1
"Puri"   1 1 1 2 2 2 1 1 1
"Puri"   1 1 1 2 2 2 5 5 1
"Puri"   2 4 4 1 2 2 2 2 4
"Puri"   3 1 3 4 2 2 4 2 1
"Puri"   3 1 1 4 4 2 4 2 1
"Puri"   1 1 1 4 4 4 3 4 1
"Puri"   1 1 5 4 4 2 3 2 1
"Puri"   1 1 5 2 5 5 2 2 5
"Khurda" 1 1 1 4 2 2 4 2 1
"Khurda" 1 1 1 2 2 2 4 4 1
"Khurda" 4 2 4 5 2 2 4 2 1
"Khurda" 4 5 5 5 4 4 2 5 1
"Khurda" 1 1 1 5 4 4 3 4 1
"Khurda" 1 1 1 4 2 4 4 4 1
"Khurda" 1 1 1 4 4 4 3 2 1
"Khurda" 1 1 1 4 2 4 4 1 1
"Khurda" 1 1 1 4 4 4 2 4 1
"Khurda" 1 1 1 2 4 2 2 4 1
"Ganjam" 4 1 1 1 2 4 4 4 1
"Ganjam" 4 1 1 4 2 4 4 1 1
"Ganjam" 4 1 1 4 2 2 2 4 1
"Ganjam" 2 1 1 4 2 2 4 1 1
"Ganjam" 4 1 1 4 2 2 2 2 1
"Ganjam" 1 1 1 4 2 2 1 1 1
"Ganjam" 1 1 1 4 2 2 2 2 1
"Ganjam" 1 1 1 4 2 2 1 4 1
"Ganjam" 4 1 1 4 2 2 2 2 1
"Ganjam" 4 1 1 4 2 4 2 4 1
"Ganjam" 1 1 1 4 2 2 2 1 1
end

gen id = _n 
local i = 1 
foreach v in Plastic_bags Plastic_bottles Food_wrappers Fishing_nets Fishinghooks_lines Fishing_traps Synthetic_ropes Metal_cans Glass_bottles { 
    local label = subinstr("`v'", "_", " ", .)
    label def Item `i' "`label'", add 
    local ++i 
}

ds id District, not 

preserve 

rename (`r(varlist)') (Grade#), addnumber 

reshape long Grade, i(id) j(which) 

label val which Item 

myaxis Which=which, sort(mean Grade)

collapse Grade, by(District Which) 
gen D = substr(District, 1, 1)

separate Grade, by(D) veryshortlabel 

local opts ms(none) mla(D) mlabpos(0) mlabsize(medlarge)

scatter Which Grade1, `opts' mlabc(black)  ///
|| scatter Which Grade2, `opts' mlabc(red) ///
|| scatter Which Grade3, `opts' mlabc(blue) ///
yla(1/9, grid valuelabel tlc(none) glw(medthin) glp(solid) glc(gs12)) ysc(r(0.8 .)) ytitle(which) ysc(reverse) ytitle("") xtitle(mean grade) legend(off) 

restore

Code:

Announcement