My colleagues and I obtained a random sample of doctors. We distributed two rounds of surveys, each round containing the same set of questions separated in time, to our random sample of doctors with replacement (i.e., everyone in the sample received both waves). Across both rounds of the survey, about a third of all responses were people who responded to the survey twice; all others only responded once. We are interested in analyzing an outcome at the level of specialties - that is, how did the outcome change for Internists vs. Hospitalists vs. others from round 1 to round 2. The random sample of doctors we obtained was stratified by specialty because we did not have the money/resources to do a proper clustered sample. We did sample specialties at different proportions (oversampling rare specialties, etc.). I would like to treat our response data as a repeated cross-section of specialties rather than doctors because we think the interesting variation in our outcome over time happened at the level of specialties, but I also want to use all the information in the individual survey responses instead of collapsing into specialty units (to preserve demographics, location, etc.). My question is: Is it methodologically sound to treat these individual responses as a repeated sample of specialties provided that I construct weights correctly and specify the model correctly (i.e., is it valid to say I have a "panel" of specialties given correct weights)? Separately, do I need to do anything special when estimating standard errors to account for the subset of people who responded to both surveys? My intuition is to cluster at the individual level despite most clusters containing only one person.
Additionally, I have svyset my data using,
Is this correct for the design I have described? This is the first time I've used the svyset commands in Stata, so I'm still learning how to use it correctly. I computed non-response weights as well as survey design weights (as the inverse probability of being sampled) for each responseand included the product of these weights in the svyset command, but I'm not sure whether it is correct to do so while also defining the fpc for each strata.
I am happy to add example data to my post in case my description is unclear.
Additionally, I have svyset my data using,
Code:
svyset person var [pw=nonresponsewgt*surveywgt], fpc(fpc var) strata(specialty var)
I am happy to add example data to my post in case my description is unclear.