Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • svy: mean and sorting estimated means

    Greetings, Statalisters.

    I'm hoping this is something someone is willing and able to explain to me, since I've failed at figuring it out on my own. I need to sort estimated means, into descending order, per year. I'm using Gallup World Poll, and the estimated means are of an index, by country per year, and respondent level sampling weights need to be a part of the computation of the means to account for survey design, etc.. Since I suspect sharing a dataex of that would make my question more complicated, the following mockup shows the situation and desired outcome. Please see here:

    I start with this dataset:
    Code:
     clear all
     webuse highschool, clear
     svyset [pweight=sampwgt]
    
    * making a fake year variable      
    set seed 12345
    generate rannum  = uniform()
    sort rannum
    generate year = .
    lab var year "grad year"
    drop rannum
    replace year = 2009 in 1/999
    replace year = 2010 in 1000/1999
    replace year = 2011 in 2000/2999
    replace year = 2012 in 3000/4071
    
    * making a fake outcome variable 
    set seed 54321
    generate rannum  = uniform()
    sort rannum
    generate happy = .
    lab var happy "happiness index"
    drop rannum
    replace happy = 1 in 1/700
    replace happy = 2 in 701/2200
    replace happy = 3 in 2201/4071
    label define happy 1 "unhappy" 2 "neutral" 3 "happy"
    label values happy happy
    codebook, compact
    I try this first:
    Code:
    *attempt 1 (using svy:mean with subpop for the if statement restricting year)
    svy, subpop(if year==2009): mean happy, over(race) coeflegend
    Which produces this:
    Code:
    (running mean on estimation sample)
    
    Survey: Mean estimation
    
    Number of strata =       1        Number of obs   =      4,071
    Number of PSUs   =   4,071        Population size =  8,000,000
                                      Subpop. no. obs =        999
                                      Subpop. size    =  2,016,463
                                      Design df       =      4,070
    
    ------------------------------------------------------------------------------
                 |       Mean  Legend
    -------------+----------------------------------------------------------------
    c.happy@race |
          White  |   2.325196  _b[[email protected]]
          Black  |   2.207406  _b[[email protected]]
          Other  |    2.30351  _b[[email protected]]
    ------------------------------------------------------------------------------
    I try this next:
    Code:
    *attempt 2 (using weights with arithmetic and stock Stata)
    sort race year
    by race year: gen meanHappy = sum(happy* sampwgt) / sum(sampwgt)
    by race year: replace meanHappy=meanHappy[_N]
    tabstat meanHappy if year==2009, statistics(mean) by(race) columns(statistics)
    Which produces a very similar, although slimmer, output:
    Code:
    Summary for variables: meanHappy
         by categories of: race (1=white, 2=black, 3=other)
    
      race |      mean
    -------+----------
     White |  2.325196
     Black |  2.207406
     Other |  2.303509
    -------+----------
     Total |  2.312006
    ------------------

    What I need to somehow produce (using this silly demo example):


    Code:
    * a tabulation that sorts these means in descending order
     White |  2.325196
     Other |  2.303509
     Black |  2.207406
    Anyone? I realize it looks ridiculous with this demo, but what I have is eleven years of 200 national means that I need to sort from highest to lowest each year. If I could make this demo work, I could make that work too. Thanks in advance for your time.

    Cheers,
    Erika

    Editing to add: I also will need to include a measure of precision of the estimates of these means - i.e. standard error of the mean. Haven't looked at that yet, since I'm stuck on this sorting task, but that's next and probably related.
    Last edited by Erika Sanborne; 30 Oct 2021, 04:58.
    I am using Stata SE 16.1.

  • #2
    Erika, based on your data, I can sort mean happiness by race within each year.

    Code:
    collapse (mean) meanHappy = happy [pw=sampwgt], by(year race)
    gsort year -meanHappy
    bys year: list race meanHappy
    Unfortunately, the SE of mean in collapse doesn't support pw.
    Last edited by Fei Wang; 30 Oct 2021, 07:22.

    Comment


    • #3
      Command mean allows pw for both mean and SE of the mean. The following code works, but I believe it can be simplified.

      Code:
      mean happy [pw=sampwgt] if year == 2009, over(race)
      
      * save results in matrix R
          mat R = r(table)[1..2,1...]'
          local rown: rown R
      
      * convert R into a dataset
          svmat R, n(col)
          keep b se
          drop if b == .
      
      * add race variable
          gen race = ""
          local line = 1
          foreach str in `rown' {
              replace race = "`str'" in `line'
              local ++line
          }
          destring race, ignore("c.happy @ .race bn") replace
          lab values race race
      
      * sorting
          gsort -b
          ren (b se) (meanHappy semHappy)
          list race meanHappy semHappy

      Comment


      • #4
        Originally posted by Fei Wang View Post
        Erika, based on your data, I can sort mean happiness by race within each year.

        Code:
        collapse (mean) meanHappy = happy [pw=sampwgt], by(year race)
        gsort year -meanHappy
        bys year: list race meanHappy
        Unfortunately, the SE of mean in collapse doesn't support pw.
        Thanks so much, Fei. This does exactly what I need in terms of ranking means with pweight applied. Unfortunately, it seems to apply a casewise deletion by default, which (in my actual data set) drops several "races" from the list. I looked at -help collapse- docs, and I see an option to specify -cw- for casewise, but no option to specify anything else (i.e. to not delete)...Hmm. I will need to look at my data and find out why this is happening. I realize you cannot possibly guess, since I didn't provide dataex from my (Gallup World Poll) data here. I will figure out this part. The sorting was the obstacle.


        Originally posted by Fei Wang View Post
        Command mean allows pw for both mean and SE of the mean. The following code works, but I believe it can be simplified.
        Thanks for this too. This makes sense to me. And I'm not sure how to simplify it, so I'm going with it since it seems to work well. Appreciate your functional response here. Thanks, mate.
        I am using Stata SE 16.1.

        Comment

        Working...
        X