Bar graph with standard errors for survey data

Kerstin Schmidt

Join Date: Apr 2017
Posts: 120

Bar graph with standard errors for survey data

24 May 2017, 09:05

Dear statalist,

I am trying to calcualte bar graphs with standard errors in Stata 14 using survey data.
The easiest way I can think of is the following:

Code:

*Convert pweights into fweights
local k = 2
gen fwt = round(10^(`k')*weight,1)
*Collapse data
collapse (mean) meanX = X  (sd) sdX = X (count) n = X [fw=fwt], by(region)
*Upper and lower values of confidence interval
gen hiX = meanX + invttail(n-1,0.025)* sdX / sqrt(n)
gen lowX = meanX - invttail(n-1,0.025)* sdX / sqrt(n)
*create bar graph
graph twoway (bar meanX region if region==1) ///
                     (bar meanX region if region==2) ///
                     (bar meanX region if region==3) ///
                     (bar meanX region if region==4)

Is this right or am I missing anything? This way the upper and lower bound of the CI seem really close.

Thanks!

Tags: None

Nick Cox

Join Date: Mar 2014

Posts: 35698
#2

24 May 2017, 10:52

In Stata terms you certainly need to start with twoway bar and add confidence intervals with other twoway commands if that is what you want.

Some general terms here are detonator and dynamite plots.

http://www.statalist.org/forums/foru...ver-dichotomic is a recent thread with links.

http://biostat.mc.vanderbilt.edu/wik...de/Poster3.pdf is direct and devastating.

Neither of these links offers detailed advice on tweaks with survey data. I would use mean's saved results to get confidence intervals and then plot those on graphs showing all the data.
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#3

24 May 2017, 11:37

cibar, a new-ish package, written by Alexander Staudt, might provide what you want. Type ssc install cibar, and read the help file. Without a snippet of your data I can't be sure (incidentally please ssc install dataex and use it), but the code you want is probably something like this

Code:

cibar [fweight=meanX], over1(region)

Last edited by Chris Larkin; 24 May 2017, 11:45. Reason: graphregion(col(white)) does not work as an option. I forgot that i'd edited the ado when I installed it
Comment

Kerstin Schmidt

Join Date: Apr 2017
Posts: 120

25 May 2017, 01:52

I just saw that I forgot to include the last line of the code I am currently using. So the right one ist:

Code:

  *Convert pweights into fweights local k = 2 gen fwt = round(10^(`k')*weight,1) *Collapse data collapse (mean) meanX = X  (sd) sdX = X (count) n = X [fw=fwt], by(region) *Upper and lower values of confidence interval gen hiX = meanX + invttail(n-1,0.025)* sdX / sqrt(n) gen lowX = meanX - invttail(n-1,0.025)* sdX / sqrt(n) *create bar graph graph twoway (bar meanX region if region==1) ///                      (bar meanX region if region==2) ///                      (bar meanX region if region==3) ///                      (bar meanX region if region==4) ///                      (rcap hiX lowX region)

Can I use this one? PS: I want to stick to bar graphs, so the alternative of dotplot is not what I am looking for

Comment

Kerstin Schmidt

Join Date: Apr 2017

Posts: 120
#5

25 May 2017, 01:57

Ahhh, I dont know why the code is being displayed in one row now.
I just added one line at the bottom:

Code:

(rcap hiX lowX region)

Can I use this with the code posted above ?
PS: I want to stick to bar graphs, so the alternative of dotplots is not what I am looking for
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

25 May 2017, 02:27

Your preferences are yours to follow, but as other people may be interested in this thread, I'll pursue #2. The Statalist link there is important to understand what I'm doing.

Lacking a data example in #1 I turned to Stata's own examples. The help for mean includes this code

Code:

webuse highschool, clear svy: mean weight svy: mean weight, over(sex)

so I started to play with plotting the data.

The first discoveries, not surprisingly, are that the sample sizes are so large that standard errors or confidence intervals barely show on graphs and that weight is right-skewed, so comparisons on a transformed scale make more sense.

Code:

webuse highschool, clear gen log_weight = log(weight) svy: mean log_weight, over(sex) mat table = r(table) local mean1 = table[1,1] local mean2 = table[1,2] local ll1 = table[5,1] local ll2 = table[5,2] local ul1 = table[6,1] local ul2 = table[6,2] * install with -ssc inst mylabels- mylabels 100(50)300, myscale(log(@)) local(yla) * install from SJ site qplot log_weight, over(sex) trscale(invnormal(@)) yla(`yla') aspect(1) /// ytitle(Weight (pounds)) mc(blue red) /// addplot(scatteri `mean1' -4 `mean1' 4, recast(line) lcolor(blue) /// || scatteri `mean2' -4 `mean2' 4, recast(line) lcolor(red)) /// xtitle(normal quantile) yla(, ang(h)) /// legend(order(1 2) pos(11) ring(0) col(1)) note(lines show geometric means)

Although I won't plot confidence intervals, the code above gives examples of retrieving them from Stata's results.

So, this is what I found:

The moral is no more than any introductory text should explain. If you leap towards highly reduced summaries such as means +/- SE, you may miss structure in the data that could be interesting or important. Researchers in the field deserve the display even if they dismiss it as detail. There is more going on there than just the shift of distributions shown most strongly in the middle.
Comment
Kerstin Schmidt

Join Date: Apr 2017

Posts: 120
#7

25 May 2017, 02:37

@ Chris: The cibar package also looks good. What is the difference in the calculation between the cibar command and the code(s) in #1 and #5 I posted above?
Comment

Announcement

Bar graph with standard errors for survey data

Comment

Comment

Comment

Comment

Comment

Comment