Conflicting advice on specifying post stratification weights in svyset

Anne Grunseit

Join Date: May 2016
Posts: 7

Conflicting advice on specifying post stratification weights in svyset

11 May 2016, 02:23

Hi,

I have tried sorting this one out by looking at previous fora, the manual and the YouTube video but actually feel i have got conflicting advice. I am trying to correct for different distributions across age and am using Stata 14.1. The Stata official video at https://www.youtube.com/watch?v=lWXhGeT8u5M seems to recommend that on the "poststratification" tab under svyset, you enter a variable indicating population proportion of the reference population in "Poststratum weights" (weight1 below) and the stratum to which it refers into "Poststrata" (q10age below). Other sources seems to say that what gets entered into "Poststratum weights" is a variable which is calculated as the population percent divided by the sample percent (for each stratum; weight2 below). They are obviously quite different. i have put mine below. The latter makes more sense to me, but the video clearly says otherwise. It is also not clear to me how to create a combined poststratification weight when you do not have cell proportions - ie., if i want to have poststratification weights for age and gender but only have population proportions for age separately to gender rather than age*gender.

q10age	Pop N (PRData)	weight1	Samp N	Samp %	weight2
Age18-24	16530	0.1065	21	0.0243	4.382716049
Age25-34	52092	0.3357	170	0.1965	1.708396947
Age35-44	48054	0.3096	310	0.3584	0.863839286
Age45-49	16085	0.1036	140	0.1618	0.640296663
Age50-54	10593	0.0683	99	0.1145	0.59650655
Age55-64	9147	0.0589	93	0.1075	0.547906977
Age65+	2688	0.0173	32	0.037	0.467567568
	155189	0.9999	865	1

Thank-you,
Anne

Last edited by Anne Grunseit; 11 May 2016, 02:28.

Tags: poststratification, svyset

Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

11 May 2016, 08:43

1. Post-stratification weights in Stata should be either 1) the population category N's or 2) the population category proportions. You do have to enter a [pweight = myweight] statement into svyset. Problems arise is a post-stratification category is unknown for some respondents.

2. To post-stratify on two different marginal totals (age, gender) when you don't have joint totals (age-gender), you need an iterative method called raking. Two contributed commands in Stata will do this: Nick Winter's survwgt (SSC) and Stas Kolenikov's ipfraking (findit). The survwgt package also includes a command for non-response weighting. I've used both with success. I recommend ipfraking, because it has an option to limit extreme weight changes and the help has useful examples of raking two margins, your situation.

I myself prefer to enter population totals; however one must be sure that both (say age and gender) will sum to the same number. As with single-margin post-stratification, missing values for one of the margins can be a problem, though sometimes the population data also include a missing category. Otherwise you must distribute the sample unknowns into one of the other categories, just for raking purposes. Convert these back to missing after creating the raked weights.

A good guide to raking is

Battaglia, M. P., Hoaglin, D. C., & Frankel, M. R. (2013). Practical considerations in raking survey data. Survey Practice, 2(5).
available at: http://www.surveypractice.org/index.php/SurveyPractice/article/view/176/0

Last edited by Steve Samuels; 11 May 2016, 08:59.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Anne Grunseit

Join Date: May 2016

Posts: 7
#3

11 May 2016, 15:43

Hi Steve,

Thanks for your help with this. If i have understood you correctly, in my case I would not only enter the variable "Pop N (PRData)" in the post-stratification tab but the same variable in the "pweight=" statement?? Amazingly i don't have any missing data for age and gender so fortunately do not have to deal with that this time! have you any sources i could go to for the case when I do?
Thanks for the raking references. i will look into these.
Anne
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

11 May 2016, 17:29

No: there are three options that must be set for post-stratification: pweight, poststrata() postweight(). The outline of the svyset command is as follows:

Code:

svyset psu [pweight = ], strata() poststrata() postweight()

1. The study sampling weight goes in the [pweight= ] clause.
2. The post-stratification variable, e.g. agegroup, goes in the poststrata() option.
3. The variable that holds the population counts (or proportions) for the agegroup categories goes in the postweight() option

Here's an example that uses the auto data set and post-stratifies the "foreign" variable:

Code:

clear input /// foreign population totals foreign PopN 0 100 1 200 end tempfile t1 save `t1' sysuse auto, clear sort foreign merge m:1 foreign using `t1' tab foreign . tab foreign Car type | Freq. Percent Cum. ------------+----------------------------------- Domestic | 52 70.27 70.27 Foreign | 22 29.73 100.00 ------------+----------------------------------- Total | 74 100.00 svyset rep78 [pw = turn], poststrat(foreign) postweight(PopN) svy: tab foreign, count . svy: tab foreign, count Number of strata = 1 Number of obs = 69 Number of PSUs = 5 Population size = 300 N. of poststrata = 2 Design df = 4 ---------------------- Car type | count ----------+----------- Domestic | 100 Foreign | 200 | Total | 300 ---------------------- Key: count = weighted count

The Raking document I linked to applies to post-stratification as well; indeed raking is a form of post-stratification. For serious work, use the menus for first-time set up only. The menu will record commands which you can then paste into the do file editor, and save as ".do" files. This is the only way to document what you have done and to apply fixes without starting from scratch.

Last edited by Steve Samuels; 11 May 2016, 17:36.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Anne Grunseit

Join Date: May 2016

Posts: 7
#5

11 May 2016, 18:57

Sorry just another addendum, I have just read through the ipfraking documents and think i have a handle on it and can create these weights, but it is not clear what you do with these once they are generated. That is, are they entered in svyset and if so, where? Or are they specified some other way? Thanks for any help.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

12 May 2016, 06:31

After creating the raked weights, svyset and use the raked weight as the argument in the [pw = ] statement.

Note that with ipfraking, you must create a formatted matrix for each margin, with a two-level row name; the prefix (here "r") can be anything; the part after the colon is the variable value.

Code:

matrix total_foreign = 100 \ 200 . matrix rownames total_foreign = r:0 r:1 . matrix list total_foreign total_foreign[2,1] c1 r:0 100 r:1 200

Last edited by Steve Samuels; 12 May 2016, 06:49.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Anne Grunseit

Join Date: May 2016

Posts: 7
#7

13 May 2016, 00:47

Thanks Steve - I appreciate your patience. I guess my confusion re pweight is that i do not have a pweight as such - there was no sampling frame as the survey was merely sent out by a link in a newsletter (i do not have any data on how many people that is). All the examples do have a pweight but i am not sure how i would generate one given i have no sampling frame. This is the first time I have had such a situation.

your second post does have something slightly different to the article: following from the article i created the following syntax:

CODE:
generate byte _one = 1
matrix PRData_age = (31076,16530,52092,48054,16085,10593, 9147,2688)
matrix colnames PRData_age = 1 2 3 4 5 6 7 8
matrix coleq PRData_age = _one
matrix rownames PRData_age = q10age

matrix PRData_gender = (79271,106994)
matrix colnames PRData_gender = 1 2
matrix coleq PRData_gender = _one
matrix rownames PRData_gender = q9gender

matrix list PRData_age, f(%10.0g) //this does show what it should

matrix list PRData_gender, f(%10.0g) //this does show what it should

ipfraking [pw=pweight], generate(rakedwgt) ctotal(PRData_age PRData_gender)

is this not correct??

Great that i now know where to put the raked weight!
Thanks, Anne
PS i do use syntax as you recommend but for this procedure use the dialogue boxes in the first instance
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#8

13 May 2016, 08:21

Sorry-I forgot: ipfraking expects row vectors, not column vectors. Your code looks correct, but I don't have your data. You should try it out on a known population, like the auto data, which I failed to do.

Because you don't have a probability sample, set the sampling weight equal to 1. Raking can reduce (but not remove) bias due to known differences between the composition of the sample and the population. However it won't touch other differences between responders and population (non-response bias), which is apt to be severe in the kind of study you describe.

Please read FAQ 12 to learn how to put code and results between CODE delimiters. The opening delimiter is [C O D E]] and the closing delimiter is [/C O D E], but with spaces removed.

Good luck!

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment

Announcement

Conflicting advice on specifying post stratification weights in svyset

Comment

Comment

Comment

Comment

Comment

Comment

Comment