svy: mean and sorting estimated means

Erika Sanborne

Join Date: Oct 2019
Posts: 9

svy: mean and sorting estimated means

30 Oct 2021, 04:55

Greetings, Statalisters.

I'm hoping this is something someone is willing and able to explain to me, since I've failed at figuring it out on my own. I need to sort estimated means, into descending order, per year. I'm using Gallup World Poll, and the estimated means are of an index, by country per year, and respondent level sampling weights need to be a part of the computation of the means to account for survey design, etc.. Since I suspect sharing a dataex of that would make my question more complicated, the following mockup shows the situation and desired outcome. Please see here:

I start with this dataset:

Code:

 clear all
 webuse highschool, clear
 svyset [pweight=sampwgt]

* making a fake year variable      
set seed 12345
generate rannum  = uniform()
sort rannum
generate year = .
lab var year "grad year"
drop rannum
replace year = 2009 in 1/999
replace year = 2010 in 1000/1999
replace year = 2011 in 2000/2999
replace year = 2012 in 3000/4071

* making a fake outcome variable 
set seed 54321
generate rannum  = uniform()
sort rannum
generate happy = .
lab var happy "happiness index"
drop rannum
replace happy = 1 in 1/700
replace happy = 2 in 701/2200
replace happy = 3 in 2201/4071
label define happy 1 "unhappy" 2 "neutral" 3 "happy"
label values happy happy
codebook, compact

I try this first:

Code:

*attempt 1 (using svy:mean with subpop for the if statement restricting year)
svy, subpop(if year==2009): mean happy, over(race) coeflegend

Which produces this:

Code:

(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       1        Number of obs   =      4,071
Number of PSUs   =   4,071        Population size =  8,000,000
                                  Subpop. no. obs =        999
                                  Subpop. size    =  2,016,463
                                  Design df       =      4,070

------------------------------------------------------------------------------
             |       Mean  Legend
-------------+----------------------------------------------------------------
c.happy@race |
      White  |   2.325196  _b[[email protected]]
      Black  |   2.207406  _b[[email protected]]
      Other  |    2.30351  _b[[email protected]]
------------------------------------------------------------------------------

I try this next:

Code:

*attempt 2 (using weights with arithmetic and stock Stata)
sort race year
by race year: gen meanHappy = sum(happy* sampwgt) / sum(sampwgt)
by race year: replace meanHappy=meanHappy[_N]
tabstat meanHappy if year==2009, statistics(mean) by(race) columns(statistics)

Which produces a very similar, although slimmer, output:

Code:

Summary for variables: meanHappy
     by categories of: race (1=white, 2=black, 3=other)

  race |      mean
-------+----------
 White |  2.325196
 Black |  2.207406
 Other |  2.303509
-------+----------
 Total |  2.312006
------------------

What I need to somehow produce (using this silly demo example):

Code:

* a tabulation that sorts these means in descending order
 White |  2.325196
 Other |  2.303509
 Black |  2.207406

Anyone? I realize it looks ridiculous with this demo, but what I have is eleven years of 200 national means that I need to sort from highest to lowest each year. If I could make this demo work, I could make that work too. Thanks in advance for your time.

Cheers,
Erika

Editing to add: I also will need to include a measure of precision of the estimates of these means - i.e. standard error of the mean. Haven't looked at that yet, since I'm stuck on this sorting task, but that's next and probably related.

Last edited by Erika Sanborne; 30 Oct 2021, 04:58.

I am using Stata SE 16.1.

Tags: matrix, sort descending, sorting means, survey, svy:mean

Fei Wang

Join Date: Oct 2021

Posts: 726
#2

30 Oct 2021, 07:20

Erika, based on your data, I can sort mean happiness by race within each year.

Code:

collapse (mean) meanHappy = happy [pw=sampwgt], by(year race) gsort year -meanHappy bys year: list race meanHappy

Unfortunately, the SE of mean in collapse doesn't support pw.

Last edited by Fei Wang; 30 Oct 2021, 07:22.
1 like
Comment

Fei Wang

Join Date: Oct 2021
Posts: 726

30 Oct 2021, 08:41

Command mean allows pw for both mean and SE of the mean. The following code works, but I believe it can be simplified.

Code:

mean happy [pw=sampwgt] if year == 2009, over(race)

* save results in matrix R
    mat R = r(table)[1..2,1...]'
    local rown: rown R

* convert R into a dataset
    svmat R, n(col)
    keep b se
    drop if b == .

* add race variable
    gen race = ""
    local line = 1
    foreach str in `rown' {
        replace race = "`str'" in `line'
        local ++line
    }
    destring race, ignore("c.happy @ .race bn") replace
    lab values race race

* sorting
    gsort -b
    ren (b se) (meanHappy semHappy)
    list race meanHappy semHappy

Comment

Erika Sanborne

Join Date: Oct 2019

Posts: 9
#4

31 Oct 2021, 01:36

Originally posted by Fei Wang View Post

Erika, based on your data, I can sort mean happiness by race within each year.

Code:

collapse (mean) meanHappy = happy [pw=sampwgt], by(year race) gsort year -meanHappy bys year: list race meanHappy

Unfortunately, the SE of mean in collapse doesn't support pw.

Thanks so much, Fei. This does exactly what I need in terms of ranking means with pweight applied. Unfortunately, it seems to apply a casewise deletion by default, which (in my actual data set) drops several "races" from the list. I looked at -help collapse- docs, and I see an option to specify -cw- for casewise, but no option to specify anything else (i.e. to not delete)...Hmm. I will need to look at my data and find out why this is happening. I realize you cannot possibly guess, since I didn't provide dataex from my (Gallup World Poll) data here. I will figure out this part. The sorting was the obstacle.

Originally posted by Fei Wang View Post

Command mean allows pw for both mean and SE of the mean. The following code works, but I believe it can be simplified.

Thanks for this too. This makes sense to me. And I'm not sure how to simplify it, so I'm going with it since it seems to work well. Appreciate your functional response here. Thanks, mate.

I am using Stata SE 16.1.
Comment

Announcement

svy: mean and sorting estimated means

Comment

Comment

Comment