I was using Stata 13.1, and try to work with a survey data with item non-responses, and but I had a hard time to figure out how I should set the subpop().
Especially because the "Number of obs =" and "Population size" changes in an unexpected way.
To demonstrate my problem, I used a modified the nhanes2d on the web and ran the same program. Here I modified the original data by
- limiting the data to the first 5000 records
- introducing missing data
- pretend like a stratified random sampling by overriding svyset using only strata and weights.
and ran "svy tab" in three ways.
#1 svy: tab heartatk, ci format(%9.0g) , if female==1
#2 svy, subpop(female): tab heartatk, ci format(%9.0g)
#3 svy, subpop(if female==1 & heartatk!=.): tab heartatk, ci format(%9.0g)
They gave back all different
-Number of obs (on the right top);
-Population size; and
-Confidence Intervals (lb and ub)
First difference appeared natural, but was actually not, when I looked into the "number of obs" in #2. I started wondering where the "number of obs" of 3301 come from?
It is not the number of persons who has non-missing values in "female," which is 4001.
I'd appreciate any comments.
Here is the results from Stata.
Especially because the "Number of obs =" and "Population size" changes in an unexpected way.
To demonstrate my problem, I used a modified the nhanes2d on the web and ran the same program. Here I modified the original data by
- limiting the data to the first 5000 records
- introducing missing data
- pretend like a stratified random sampling by overriding svyset using only strata and weights.
and ran "svy tab" in three ways.
#1 svy: tab heartatk, ci format(%9.0g) , if female==1
#2 svy, subpop(female): tab heartatk, ci format(%9.0g)
#3 svy, subpop(if female==1 & heartatk!=.): tab heartatk, ci format(%9.0g)
They gave back all different
-Number of obs (on the right top);
-Population size; and
-Confidence Intervals (lb and ub)
First difference appeared natural, but was actually not, when I looked into the "number of obs" in #2. I started wondering where the "number of obs" of 3301 come from?
It is not the number of persons who has non-missing values in "female," which is 4001.
I'd appreciate any comments.
Here is the results from Stata.
Code:
. use http://www.stata-press.com/data/r13/nhanes2d, clear . keep if _n<=5000 // limit the sample (5351 observations deleted) . replace heartatk=. if _n<1000 // create missing to the analyzed variable (999 real changes made, 999 to missing) . replace female=. if _n>700 & _n<1700 // create missing to to the subpop() variable (999 real changes made, 999 to missing) . ta female heartatk, mis // show the missing patterns 1=female, | heart attack, 1=yes, 0=no 0=male | 0 1 . | Total -----------+---------------------------------+---------- 0 | 1,437 115 329 | 1,881 1 | 1,700 49 371 | 2,120 . | 657 43 299 | 999 -----------+---------------------------------+---------- Total | 3,794 207 999 | 5,000 . svyset [pweight=finalwgt],strata(strata) // pretend simple random sampling pweight: finalwgt VCE: linearized Single unit: missing Strata 1: strata SU 1: <observations> FPC 1: <zero> . ** Now run svy: tab in three ways . svy: tab heartatk, ci format(%9.0g) , if female==1 (running tabulate on estimation sample) Number of strata = 11 Number of obs = 1749 Number of PSUs = 1749 Population size = 19445224 Design df = 1738 ------------------------------------------------- heart | attack, | 1=yes, | 0=no | proportions lb ub ----------+-------------------------------------- 0 | .9788396 .9704661 .9848761 1 | .0211604 .0151239 .0295339 | Total | 1 ------------------------------------------------- Key: proportions = cell proportions lb = lower 95% confidence bounds for cell proportions ub = upper 95% confidence bounds for cell proportions . svy, subpop(female): tab heartatk, ci format(%9.0g) (running tabulate on estimation sample) Number of strata = 11 Number of obs = 3301 Number of PSUs = 3301 Population size = 36690922 Subpop. no. of obs = 1749 Subpop. size = 19445224 Design df = 3290 ------------------------------------------------- heart | attack, | 1=yes, | 0=no | proportions lb ub ----------+-------------------------------------- 0 | .9788396 .9704643 .984877 1 | .0211604 .015123 .0295357 | Total | 1 ------------------------------------------------- Key: proportions = cell proportions lb = lower 95% confidence bounds for cell proportions ub = upper 95% confidence bounds for cell proportions Note: 3 strata omitted because they contain no subpopulation members. . svy, subpop(if female==1 & heartatk!=.): tab heartatk, ci format(%9.0g) (running tabulate on estimation sample) Number of strata = 11 Number of obs = 3375 Number of PSUs = 3375 Population size = 37657063 Subpop. no. of obs = 1749 Subpop. size = 19445224 Design df = 3364 ------------------------------------------------- heart | attack, | 1=yes, | 0=no | proportions lb ub ----------+-------------------------------------- 0 | .9788396 .9704638 .9848773 1 | .0211604 .0151227 .0295362 | Total | 1 ------------------------------------------------- Key: proportions = cell proportions lb = lower 95% confidence bounds for cell proportions ub = upper 95% confidence bounds for cell proportions Note: 5 strata omitted because they contain no subpopulation members.
Comment