Good day,
We are working on a project with complex survey data. The data is set as survey data using svyset identifying the primary sampling unit (psu), setting the pweight and the variable identifying the strata.
The questionnaire included several skip patterns both at a question and section level. The data however does not reflect the skip patterns very well, i.e., observations have data where missing data were expected based on responses to previous questions.
We would like to restrict analyses to relevant participants only for the different sections of the questionnaire based on the skip patterns. As a result indicator variables were subsequently derived from selected variables to identify eligible observations on different questions and sections.
We would like to clarify if using the subpop option with svy is correct here to perform analysis for part of the population. Specifically, we would like to clarify that sub setting the sample in this way (using subpop) is valid given that strictly speaking the observations not included for analysis using subpop is expected to have missing data on the variables of interest in the analysis due to the skip patterns. That is, how does missingness on the variables in the analysis affect the calculation of the SEs. However here the missingness is expected and by design.
We are not sure what examples (code or output) to present to explain the question further - suggestions would be appreciated here.
Setting the complex survey design:
Here is an example of a two-way table where we would like to limit the estimation to observations that identified as Passengers using QAPassengers. We know that the data are missing for those that did not identify as Passengers, i.e., where QAPassengers==0. We would like to clarify that this is an appropriate way to sub set to obtain estimates.
Thanks.
We are working on a project with complex survey data. The data is set as survey data using svyset identifying the primary sampling unit (psu), setting the pweight and the variable identifying the strata.
The questionnaire included several skip patterns both at a question and section level. The data however does not reflect the skip patterns very well, i.e., observations have data where missing data were expected based on responses to previous questions.
We would like to restrict analyses to relevant participants only for the different sections of the questionnaire based on the skip patterns. As a result indicator variables were subsequently derived from selected variables to identify eligible observations on different questions and sections.
We would like to clarify if using the subpop option with svy is correct here to perform analysis for part of the population. Specifically, we would like to clarify that sub setting the sample in this way (using subpop) is valid given that strictly speaking the observations not included for analysis using subpop is expected to have missing data on the variables of interest in the analysis due to the skip patterns. That is, how does missingness on the variables in the analysis affect the calculation of the SEs. However here the missingness is expected and by design.
We are not sure what examples (code or output) to present to explain the question further - suggestions would be appreciated here.
Setting the complex survey design:
Code:
svyset EA_Code11 [pw = Benchmark_wgt], strata(Stratum_Anal)
Code:
svy, subpop(QAPassengers): tab q80a q80b, format(%11.3g) column percent obs ci