I'm trying to compare a subpopulation to the overall population for the purpose of evaluating survey nonresponse bias. In other words, I have frame data for the entire sample (both respondents and nonrespondents), and I want to run a t-test to evaluate whether the proportion of the respondent subpopulation with a certain characteristic is significantly different from the proportion of the overall population (not just the proportion of the nonrespondent population; I want the "respondent + nonrespondent" population) with that characteristic. Furthermore, I need to use jackknife standard errors in this analysis.
In theory, it seems that the best thing to do would be to run "svy: tabulate" (or "svy: proportion") on the overall sample, then run it again using the "if" qualifier to restrict the sample to respondents only, and then use the "suest" command to compare the proportions from the two tabulations. Unfortunately, however, "suest" does not support jackknife standard errors. I've come up with a workaround, but I'm not sure if it's correct, so I was hoping to get some input. Here's an example of what I'm doing, using a hypothetical "gender" variable:
svyset [pweight=pweight], vce(jackknife) jkrweight(jkweight1-jkweight70) mse
expand 2 if complete==1, generate(respondentsonly) /*Duplicating the respondent observations and creating a new variable respondentsonly that equals 1 for the "respondents only" sample and 0 for the "respondents + nonrespondents" sample*/
svy: proportion gender, over(respondentsonly)
lincom _b[Male:0] - _b[Male:1] /*Testing whether the estimated population proportion of males from the "respondents + nonrespondents" sample is different from estimated proportion from the "respondents only" sample*/
What I'm concerned specifically concerned about is the way that the "over" option interacts with the duplicate observations for the respondents. From what I've been able to find in Stata's subpopulation estimation documentation, it seems that the "if" qualifier would be the more appropriate means of subsetting the sample between the "respondent + nonrespondent" observations and the (duplicate) respondent observations, since I don't want Stata to count the duplicate observations twice when calculating the standard errors; however, I can't figure out a way to run a hypothesis test on the coefficients from two separate tabulations. I'm very much a nonspecialist when it comes to survey variance estimation, so I thought I'd see if anyone here can tell me whether my hack is correct, or if I've made a monumentally stupid mistake (which is entirely possible). It's worth noting that the proportions and standard errors that result from "svy: proportion gender, over(respondentsonly)" are the same as those that result from running "svy: proportion gender if respondentsonly==1" and "svy: proportion gender if respondentsonly==0."
Thanks! Let me know if you need any more info; I'm new to Statalist, so excuse any rookie mistakes or omissions
In theory, it seems that the best thing to do would be to run "svy: tabulate" (or "svy: proportion") on the overall sample, then run it again using the "if" qualifier to restrict the sample to respondents only, and then use the "suest" command to compare the proportions from the two tabulations. Unfortunately, however, "suest" does not support jackknife standard errors. I've come up with a workaround, but I'm not sure if it's correct, so I was hoping to get some input. Here's an example of what I'm doing, using a hypothetical "gender" variable:
svyset [pweight=pweight], vce(jackknife) jkrweight(jkweight1-jkweight70) mse
expand 2 if complete==1, generate(respondentsonly) /*Duplicating the respondent observations and creating a new variable respondentsonly that equals 1 for the "respondents only" sample and 0 for the "respondents + nonrespondents" sample*/
svy: proportion gender, over(respondentsonly)
lincom _b[Male:0] - _b[Male:1] /*Testing whether the estimated population proportion of males from the "respondents + nonrespondents" sample is different from estimated proportion from the "respondents only" sample*/
What I'm concerned specifically concerned about is the way that the "over" option interacts with the duplicate observations for the respondents. From what I've been able to find in Stata's subpopulation estimation documentation, it seems that the "if" qualifier would be the more appropriate means of subsetting the sample between the "respondent + nonrespondent" observations and the (duplicate) respondent observations, since I don't want Stata to count the duplicate observations twice when calculating the standard errors; however, I can't figure out a way to run a hypothesis test on the coefficients from two separate tabulations. I'm very much a nonspecialist when it comes to survey variance estimation, so I thought I'd see if anyone here can tell me whether my hack is correct, or if I've made a monumentally stupid mistake (which is entirely possible). It's worth noting that the proportions and standard errors that result from "svy: proportion gender, over(respondentsonly)" are the same as those that result from running "svy: proportion gender if respondentsonly==1" and "svy: proportion gender if respondentsonly==0."
Thanks! Let me know if you need any more info; I'm new to Statalist, so excuse any rookie mistakes or omissions
Comment