Hi Everyone,
I am analyzing a multi-stage cluster sample and am attempting to appropriately calculate the design effect. The Primary Sampling Units are districts that were selected using PPS with replacement. The population sizes of the districts are large enough that multiple clusters are allocated to a district. Population information below the district is not available and the remaining stages are selected using SRS.
etc.
When using the svyset command, should the PSU be specified as the district or should the PSU be specified as the cluster id?
For example, should it be:
svyset dis
or
svyset clus
Thank you for the help!
I am analyzing a multi-stage cluster sample and am attempting to appropriately calculate the design effect. The Primary Sampling Units are districts that were selected using PPS with replacement. The population sizes of the districts are large enough that multiple clusters are allocated to a district. Population information below the district is not available and the remaining stages are selected using SRS.
District (dis) | Cluster Id (clus) | Household | Respondent in HH |
1 | 1 | 1 | 3 |
1 | 1 | 2 | 1 |
1 | 1 | 3 | 4 |
1 | 1 | 4 | 1 |
1 | 2 | 1 | 1 |
1 | 2 | 2 | 3 |
1 | 2 | 3 | 2 |
1 | 2 | 4 | 1 |
2 | 3 | 1 | 1 |
When using the svyset command, should the PSU be specified as the district or should the PSU be specified as the cluster id?
For example, should it be:
svyset dis
or
svyset clus
Thank you for the help!
Comment