Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Conflicting advice on specifying post stratification weights in svyset

    Hi,

    I have tried sorting this one out by looking at previous fora, the manual and the YouTube video but actually feel i have got conflicting advice. I am trying to correct for different distributions across age and am using Stata 14.1. The Stata official video at https://www.youtube.com/watch?v=lWXhGeT8u5M seems to recommend that on the "poststratification" tab under svyset, you enter a variable indicating population proportion of the reference population in "Poststratum weights" (weight1 below) and the stratum to which it refers into "Poststrata" (q10age below). Other sources seems to say that what gets entered into "Poststratum weights" is a variable which is calculated as the population percent divided by the sample percent (for each stratum; weight2 below). They are obviously quite different. i have put mine below. The latter makes more sense to me, but the video clearly says otherwise. It is also not clear to me how to create a combined poststratification weight when you do not have cell proportions - ie., if i want to have poststratification weights for age and gender but only have population proportions for age separately to gender rather than age*gender.
    q10age Pop N (PRData) weight1 Samp N Samp % weight2
    Age18-24 16530 0.1065 21 0.0243 4.382716049
    Age25-34 52092 0.3357 170 0.1965 1.708396947
    Age35-44 48054 0.3096 310 0.3584 0.863839286
    Age45-49 16085 0.1036 140 0.1618 0.640296663
    Age50-54 10593 0.0683 99 0.1145 0.59650655
    Age55-64 9147 0.0589 93 0.1075 0.547906977
    Age65+ 2688 0.0173 32 0.037 0.467567568
    155189 0.9999 865 1

    Thank-you,
    Anne
    Last edited by Anne Grunseit; 11 May 2016, 02:28.

  • #2
    1. Post-stratification weights in Stata should be either 1) the population category N's or 2) the population category proportions. You do have to enter a [pweight = myweight] statement into svyset. Problems arise is a post-stratification category is unknown for some respondents.

    2. To post-stratify on two different marginal totals (age, gender) when you don't have joint totals (age-gender), you need an iterative method called raking. Two contributed commands in Stata will do this: Nick Winter's survwgt (SSC) and Stas Kolenikov's ipfraking (findit). The survwgt package also includes a command for non-response weighting. I've used both with success. I recommend ipfraking, because it has an option to limit extreme weight changes and the help has useful examples of raking two margins, your situation.

    I myself prefer to enter population totals; however one must be sure that both (say age and gender) will sum to the same number. As with single-margin post-stratification, missing values for one of the margins can be a problem, though sometimes the population data also include a missing category. Otherwise you must distribute the sample unknowns into one of the other categories, just for raking purposes. Convert these back to missing after creating the raked weights.

    A good guide to raking is

    Battaglia, M. P., Hoaglin, D. C., & Frankel, M. R. (2013). Practical considerations in raking survey data. Survey Practice, 2(5).
    available at: http://www.surveypractice.org/index.php/SurveyPractice/article/view/176/0
    Last edited by Steve Samuels; 11 May 2016, 08:59.
    Steve Samuels
    Statistical Consulting
    [email protected]

    Stata 14.2

    Comment


    • #3
      Hi Steve,

      Thanks for your help with this. If i have understood you correctly, in my case I would not only enter the variable "Pop N (PRData)" in the post-stratification tab but the same variable in the "pweight=" statement?? Amazingly i don't have any missing data for age and gender so fortunately do not have to deal with that this time! have you any sources i could go to for the case when I do?
      Thanks for the raking references. i will look into these.
      Anne

      Comment


      • #4
        No: there are three options that must be set for post-stratification: pweight, poststrata() postweight(). The outline of the svyset command is as follows:

        Code:
        svyset psu [pweight = ],  strata() poststrata()  postweight()
        1. The study sampling weight goes in the [pweight= ] clause.
        2. The post-stratification variable, e.g. agegroup, goes in the poststrata() option.
        3. The variable that holds the population counts (or proportions) for the agegroup categories goes in the postweight() option

        Here's an example that uses the auto data set and post-stratifies the "foreign" variable:
        Code:
        clear
        input /// foreign population totals
        foreign PopN
        0 100
        1 200
        end
        tempfile t1
        save `t1'
        sysuse auto, clear
        sort foreign
        merge m:1 foreign using `t1'
        tab foreign
        . tab foreign
        
           Car type |      Freq.     Percent        Cum.
        ------------+-----------------------------------
           Domestic |         52       70.27       70.27
            Foreign |         22       29.73      100.00
        ------------+-----------------------------------
              Total |         74      100.00
        
        svyset rep78 [pw = turn], poststrat(foreign) postweight(PopN)
        svy: tab foreign, count
        
        . svy: tab foreign, count
        Number of strata   =         1                  Number of obs     =         69
        Number of PSUs     =         5                  Population size   =        300
        N. of poststrata   =         2                  Design df         =          4
        ----------------------
         Car type |      count
        ----------+-----------
         Domestic |        100
          Foreign |        200
                  |
            Total |        300
        ----------------------
          Key:  count     =  weighted count
        The Raking document I linked to applies to post-stratification as well; indeed raking is a form of post-stratification. For serious work, use the menus for first-time set up only. The menu will record commands which you can then paste into the do file editor, and save as ".do" files. This is the only way to document what you have done and to apply fixes without starting from scratch.
        Last edited by Steve Samuels; 11 May 2016, 17:36.
        Steve Samuels
        Statistical Consulting
        [email protected]

        Stata 14.2

        Comment


        • #5
          Sorry just another addendum, I have just read through the ipfraking documents and think i have a handle on it and can create these weights, but it is not clear what you do with these once they are generated. That is, are they entered in svyset and if so, where? Or are they specified some other way? Thanks for any help.

          Comment


          • #6
            After creating the raked weights, svyset and use the raked weight as the argument in the [pw = ] statement.

            Note that with ipfraking, you must create a formatted matrix for each margin, with a two-level row name; the prefix (here "r") can be anything; the part after the colon is the variable value.
            Code:
              matrix total_foreign = 100 \ 200
            . matrix rownames total_foreign = r:0 r:1
            . matrix list total_foreign
            
            total_foreign[2,1]
                     c1
                r:0  100
                r:1  200
            Last edited by Steve Samuels; 12 May 2016, 06:49.
            Steve Samuels
            Statistical Consulting
            [email protected]

            Stata 14.2

            Comment


            • #7
              Thanks Steve - I appreciate your patience. I guess my confusion re pweight is that i do not have a pweight as such - there was no sampling frame as the survey was merely sent out by a link in a newsletter (i do not have any data on how many people that is). All the examples do have a pweight but i am not sure how i would generate one given i have no sampling frame. This is the first time I have had such a situation.

              your second post does have something slightly different to the article: following from the article i created the following syntax:

              CODE:
              generate byte _one = 1
              matrix PRData_age = (31076,16530,52092,48054,16085,10593, 9147,2688)
              matrix colnames PRData_age = 1 2 3 4 5 6 7 8
              matrix coleq PRData_age = _one
              matrix rownames PRData_age = q10age

              matrix PRData_gender = (79271,106994)
              matrix colnames PRData_gender = 1 2
              matrix coleq PRData_gender = _one
              matrix rownames PRData_gender = q9gender

              matrix list PRData_age, f(%10.0g) //this does show what it should

              matrix list PRData_gender, f(%10.0g) //this does show what it should

              ipfraking [pw=pweight], generate(rakedwgt) ctotal(PRData_age PRData_gender)


              is this not correct??

              Great that i now know where to put the raked weight!
              Thanks, Anne
              PS i do use syntax as you recommend but for this procedure use the dialogue boxes in the first instance

              Comment


              • #8
                Sorry-I forgot: ipfraking expects row vectors, not column vectors. Your code looks correct, but I don't have your data. You should try it out on a known population, like the auto data, which I failed to do.

                Because you don't have a probability sample, set the sampling weight equal to 1. Raking can reduce (but not remove) bias due to known differences between the composition of the sample and the population. However it won't touch other differences between responders and population (non-response bias), which is apt to be severe in the kind of study you describe.

                Please read FAQ 12 to learn how to put code and results between CODE delimiters. The opening delimiter is [C O D E]] and the closing delimiter is [/C O D E], but with spaces removed.

                Good luck!


                Steve Samuels
                Statistical Consulting
                [email protected]

                Stata 14.2

                Comment

                Working...
                X