Hi fellow Stata users,
I’m running analysis on a dataset that employs a complex survey design and which samples a large portion of a population, indicating that I should use a finite population correction (fpc). As I understand it (based on page 172-173 of Stata’s Survey Data Reference Manual (Release 13)), Stata requires a variable whose values contain either the proportion of PSUs sampled within each stratum, or the total number of PSUs in the sample population within each stratum in order to calculate the fpc.
Several stratum contain only PSUs sampled with certainty, making their fpc equal to 1. When running a variety of models (chi squared tests, regressions), I find that when I defined new stratum for each site with fpc equal to one, significance levels decreased. If a PSU is sampled with certainty, I don’t understand why it would matter whether a PSU is alone in a statum or shares a stratum with other PSUs sampled with certainty.
Any advice would be appreciated. Some of the significance changes are disconcertingly large.
I’ve contained some jerry-rigged and annotated code that demonstrates my issue on a simple sample dataset. All analysis is done Stata 13.1.
I’m running analysis on a dataset that employs a complex survey design and which samples a large portion of a population, indicating that I should use a finite population correction (fpc). As I understand it (based on page 172-173 of Stata’s Survey Data Reference Manual (Release 13)), Stata requires a variable whose values contain either the proportion of PSUs sampled within each stratum, or the total number of PSUs in the sample population within each stratum in order to calculate the fpc.
Several stratum contain only PSUs sampled with certainty, making their fpc equal to 1. When running a variety of models (chi squared tests, regressions), I find that when I defined new stratum for each site with fpc equal to one, significance levels decreased. If a PSU is sampled with certainty, I don’t understand why it would matter whether a PSU is alone in a statum or shares a stratum with other PSUs sampled with certainty.
Any advice would be appreciated. Some of the significance changes are disconcertingly large.
I’ve contained some jerry-rigged and annotated code that demonstrates my issue on a simple sample dataset. All analysis is done Stata 13.1.
Code:
use http://www.stata-press.com/data/r13/fpc, clear list gen Nh2=nh/5 generate double y = runiform() * I use the variable Nh2 to calculate the finite population correction. * This variable contains 5 obs. sampled with certainty and 3 at a rate of .6 svyset psuid [pweight=weight], strata(stratid) fpc(Nh2) svy:reg y x * Now I create a new strata variable where each obs. sampled with certainty * is given a unique strata. gen strata2=stratid replace strata2=strata2+y if stratid==1 svyset psuid [pweight=weight], strata(strata2) fpc(Nh2) svy:reg y x * Between the first and second model, we see that significance levels are * consistently larger when emplying the modified strata viarable.
Comment