Hello,
Background:
We administered a questionnaire to the entire student body of a school (i.e., 461 students), but only 298 responded to the questionnaire (i.e., 163 non-respondents). We want our sample to be representative of our reference population jointly by gender (male and female) and by field of study (there are 4 fields of study in our school), i.e. 8 distinct categories. After comparing the distribution of our reference population (461) and our sample (298) for each of the 8 gender-stream categories, we found that some categories in our sample were under- or over-represented compared to the reference population.
The gender-field of study information is available for all 461 students (i.e. for both respondents and non-respondents). It is also important to note that some of the respondents did not answer all the questions: for some questions there are not 298 answers but 235 for example.
Questions:
I am a little confused. I don't know if I should use sampling weights (pweights) or poststratification and if the notation in my formula is correct.
Here is my reasoning: since the study was sent to the entire reference population and not a sample:
1. I don't need to adjust for sampling design by calculating the inverse of the probability of inclusion in the sample (pweight).
2. what I want to do is purely adjust for non-response, which varies from question to question.
So I think poststratification is the best solution. This method also allows to adjust the weighting for each question according to the number of answers to the total number of students in the school (i.e., 461).
I use the following formula:
svyset _n, poststrata(sexfil) postweight(n_type) fpc(n_pop)
where sexfil indicates the category to which the student belongs, n_type indicates the total number of students in the reference population in each category and n_pop indicates the total number of students in the reference population. [The latest column represents the inverse of the probability of inclusion in the sample for each category (assuming 298 respondents) in case I would have use the pweight option : svyset [pw=wgt], strata(sexfil) fpc(n_pop).]
Does this seem correct to you ?
Hopefully my question is clear, thanks for your help!
Background:
We administered a questionnaire to the entire student body of a school (i.e., 461 students), but only 298 responded to the questionnaire (i.e., 163 non-respondents). We want our sample to be representative of our reference population jointly by gender (male and female) and by field of study (there are 4 fields of study in our school), i.e. 8 distinct categories. After comparing the distribution of our reference population (461) and our sample (298) for each of the 8 gender-stream categories, we found that some categories in our sample were under- or over-represented compared to the reference population.
The gender-field of study information is available for all 461 students (i.e. for both respondents and non-respondents). It is also important to note that some of the respondents did not answer all the questions: for some questions there are not 298 answers but 235 for example.
Questions:
I am a little confused. I don't know if I should use sampling weights (pweights) or poststratification and if the notation in my formula is correct.
Here is my reasoning: since the study was sent to the entire reference population and not a sample:
1. I don't need to adjust for sampling design by calculating the inverse of the probability of inclusion in the sample (pweight).
2. what I want to do is purely adjust for non-response, which varies from question to question.
So I think poststratification is the best solution. This method also allows to adjust the weighting for each question according to the number of answers to the total number of students in the school (i.e., 461).
I use the following formula:
svyset _n, poststrata(sexfil) postweight(n_type) fpc(n_pop)
Sexfil | N_type | N_pop | wgt |
1 | 100 | 461 | 100/62 |
2 | 134 | 461 | 134/60 |
3 | 51 | 461 | 51/36 |
4 | 66 | 461 | 66/47 |
5 | 29 | 461 | 29/29 |
6 | 16 | 461 | 16/13 |
7 | 11 | 461 | 11/9 |
8 | 54 | 461 | 54/42 |
Does this seem correct to you ?
Hopefully my question is clear, thanks for your help!