Adding costum column labels to a graph bar with over() over()

Helder Costa

Join Date: Dec 2019

Posts: 72
#1

Adding costum column labels to a graph bar with over() over()

25 Feb 2025, 07:35

Hello,

I'm plotting regression coefficients in a grouped bar graph. I would like to add the statistical significance of those coefficients as labels on each respective bar. I know how to do it with a twoway plot using the mlabel option, but I lose the ability to use over() over() to create my grouped bar graph. My real data has variables for regression coefficients and corresponding significance levels (e.g., "***", "**", "*"). Does anyone know how to add a custom label using the graph bar command, or how to apply over() over() in a twoway graph while also using mlabel? Example of what my data structure looks like and the desired grouped bar output (without the significance stars) can be seen here:

Code:

sysuse nlsw88, clear collapse wage, by(smsa married collgrad) reshape wide wage, i(married collgrad) j(smsa) gen stars0 = "***" gen stars1 = "**" graph bar wage0 wage1, /// over(married) over(collgrad)

Thank you very much in advance for any insights.

Best,
Hélder
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35696

26 Feb 2025, 04:17

I don't have an easy solution for you. Various tricks for having it both ways -- grouping on one axis and the flexibility of twoway -- have been covered in

SJ-8-2 gr0034 . . . . . . . . . . Speaking Stata: Between tables and graphs
(help labmask, seqvar if installed) . . . . . . . . . . . . N. J. Cox
Q2/08 SJ 8(2):269--289
outlines techniques for producing table-like graphs

https://journals.sagepub.com/doi/pdf...6867X241297949

https://www.statalist.org/forums/for...dable-from-ssc

and no doubt in other places.

I got to here working ad hoc.

Code:

sysuse nlsw88, clear

collapse wage, by(smsa married collgrad)
reshape wide wage, i(married collgrad) j(smsa)

gen stars0 = "***"

gen stars1 = "**"

graph bar wage0 wage1, ///
    over(married) over(collgrad) name(target, replace)
    
* I start here 

gen x = real(word("1 3.5 2 4.5", _n))
gen xL = x - 0.2
gen xR = x + 0.2

twoway bar wage0 xL, base(0) barw(0.4) || bar wage1 xR, base(0) barw(0.4) ///
xla(1 "Single" 2 "Married" 3.5 "Single" 4.5 "Married", tlc(none) tlength(*0.1)) ///
xmla(1.5 "Not college grad" 4 "College grad", labsize(medium) tlength(*5) tlc(none)) ///
text(13 1.5 "Something") text(13 4 "interesting") ysc(r(0 14)) yla(0(2)12)

Click image for larger version

Name: costa1.png
Views: 1
Size: 40.0 KB
ID: 1773492

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35696
#3

26 Feb 2025, 04:28

On stars: I would like to return to my post https://www.stata.com/statalist/arch.../msg00646.html which I here reproduce but with historical parts corrected (in italic).

The practice of significance starring goes back about 80 years or more. I am aware of earlier uses and would be interested in much earlier uses, but a good starting point is to find

Yates, F. 1937. The Design and Analysis of Factorial Experiments. Technical Communication No 35, Imperial Bureau of Soil Science, Harpenden.

and see how * and ** were used to mark footnotes that explained significance at different levels.

It seems particularly ironic that it was Yates who used this practice prominently. Yates himself was no friend of significance testing and even criticised Fisher (his mentor and collaborator) in his obituary notice for over-emphasis on tests.

It may satisfy purely tribal imitation, namely doing just what other people do, but starring seems objectionable on several grounds:

1. If the P-value is worth printing, it is evidence in itself and need not be degraded by categorisation. If the implication is that a table of many P-values is too detailed or too charmless to be readily assimilated without decoration, then it should be replaced by a graphical display (which could include numerical labels).

2. Starring might be defended on the grounds that it indicates which hypotheses we would reject at a variety of different levels. But that would be playing several different games at once. Good conservative practice if you believe that significance testing is a good idea is to use one threshold level that you regard as appropriate, not two or more simultaneously. And once you entertain several hypotheses simultaneously, as is usually implicit in contemplation of a table with several P-values, multiplicity complicates the issue mightily (as indeed is often, but not always, recognised).

3. All calculations in (for example) a regression are conditional on assumptions being satisfied, assumptions that we usually should regard as suspect at the best of times. Loosely, we would normally regard coefficient estimates as being more reliable than standard errors which in turn are more reliable than P-values. Why many analysts should habitually choose to subject the least reliable part of the modelling results to the most intense scrutiny is a deep puzzle.

4. Significance is, or should be, always a lesser deal than strength of relationship or magnitude of effect. (If not, your sample size is too small.) Only the other day someone asked me privately to add starring to one of my own programs and gave as exemplar some output in which a correlation of 0.0753 was starred. Your view may well differ, but I have never yet found a correlation of that magnitude worth any consideration. Being assured that it really is not zero is not very interesting or helpful to me. Thus starring seems to me to encourage the wrong kind of scrutiny. I

I flag that Peter Sprent epitomised starring as "more appropriate to a hotel guide-book than a serious scientific paper" (JRSS A 1970 p.143).
1 like
Comment

Helder Costa

Join Date: Dec 2019
Posts: 72

26 Feb 2025, 06:48

Thank you Nick, I think your solution works perfectly. Adding mlabel to your twoway bar solves the issue:

Code:

sysuse nlsw88, clear

collapse wage, by(smsa married collgrad)
reshape wide wage, i(married collgrad) j(smsa)

gen stars0 = "***"

gen stars1 = "**"

graph bar wage0 wage1, ///
    over(married) over(collgrad) name(target, replace)
    
* I start here

gen x = real(word("1 3.5 2 4.5", _n))
gen xL = x - 0.2
gen xR = x + 0.2

twoway bar wage0 xL, base(0) barw(0.4) mlabel(stars0) || bar wage1 xR, base(0) barw(0.4) mlabel(stars1) ///
xla(1 "Single" 2 "Married" 3.5 "Single" 4.5 "Married", tlc(none) tlength(*0.1)) ///
xmla(1.5 "Not college grad" 4 "College grad", labsize(medium) tlength(*5) tlc(none)) ///
ysc(r(0 14)) yla(0(2)12)

Announcement