Logistic regression with complex survey design and bootstrap weights

Joe Saunders

Join Date: Mar 2018

Posts: 1
#1

Logistic regression with complex survey design and bootstrap weights

12 Mar 2018, 06:38

Hi everyone, total STATA noob here - I've just started to learn the package to make use of some statistics canada survey data from the GSS, which has bootstrap weights to properly estimate the variability of the estimate (I think!)

I have the survey setup syntax:

Code:

. svyset [pweight=wght_per], bsrweight(wtbs_001- wtbs_500) bsn(25) vce(bootstrap) dof(500) mse

with 500 bootstrap weights per observation.

then, running:

Code:

. svy: logistic binge_yesno ib(0).sex_01 ib(0).vismin ib(0).imprel

I get odds ratios that are in agreement with what I have had before in SPSS (which can't handle bootstrap weights), but the weights make the standard errors HUGE and nothing is even close to significant in the subpopulation I am using - which is totally cool, as long as I haven't done something totally horrible which I think I may have.

Major questions: Is there something that I must do to adjust these bootstrap weights for a subpopulation? the overall survey has ~33000 observations, with my population of interest being ~1250.

When using survey design and running logistic regression, I don't seem to get any pseudo R-squared values, is there some way to get this or is this prohibited by design? Similarly, I am not sure if I am then able to run goodness of fit tests that give any useful results.

Another important note is that in my subpopulation, the survey is estimating the values for 1.2 mil people.

Thanks so much for reading this, I'm very clueless and would appreciate any advice on anything you pick up.
Tags: None
Andrew Kenny

Join Date: Sep 2017

Posts: 27
#2

11 Apr 2018, 18:22

Some insights to one element of your question can be found here:

https://www.statalist.org/forums/for...n-svy-logistic

Question:
Can I get a pseudo r-squared in SVY logistic ?

In short the answer is no. However, there are suggestions that other models (non SVY) may give you something close to what you want.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#3

13 Apr 2018, 13:52

Use the subpop() option to get correct standard errors. For example, if you have a 0-1 indicator Z for your subpopulation, use one of:

Code:

svy, subpop(if Z): logistic binge_yesno ib(0).sex_01 ib(0).vismin ib(0).imprel svy, subpop(if Z==1): logistic binge_yesno ib(0).sex_01 ib(0).vismin ib(0).imprel

Unfortunately, this (correct) analysis will increase standard errors even further. For more information see: https://www.stata.com/manuals/svysub...estimation.pdf

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment

Announcement

Logistic regression with complex survey design and bootstrap weights

Comment

Comment