Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata including category with zero observations in survey tabulate output

    All,

    I am running two-way tabulations on data survey set for weighting, stratification, etc. The column variable has 3 categories but the variable that defines the subpopulation of interest excludes one of the three categories in the column variable. I was expecting that running the two-way table with the defined subpopulation would automatically exclude the column that has no observations. It instead prints out the column of all zeroes and does not provide significance tests explaining that these can't be computed because the marginals contain a zero. I am guessing this has a simple solution that I am just overlooking. Below is the code that I am using and a sample table that is output. The subpopulation identifier is defined as chs_strict > 0, leaving only cases with a value of 1 or 2 in the chs_strict variable.

    As always, thank you for any help.

    Code:
    svyset hosp_ed [pw=discwt], strata(neds_stratum) single(centered)
    
    foreach var of varlist race female pay1 pay2 age_cat catag3 pregnant pl_nchs region disp_ed ///
            aweekend year {
        svy, subpop(subp): tabulate `var' chs_strict, count column percent ///
                format(%12.2g) cellwidth(15) stubwidth(15) pearson
    }
    Click image for larger version

Name:	Screenshot 2024-12-18 at 1.57.14 PM.png
Views:	1
Size:	75.8 KB
ID:	1769704

  • #2
    James is prudent in expression caution about using subpop() vs. if with complex survey data. You may encounter problems with incorrect standard errors computations. West, Berglund and Heeringa (2008) provided a thorough overview of when the problems happen. I would verify that using if to filter the data set produces the same standard errors, and then proceed to the tabulation with the empty column wiped out:

    Here's the one with the problem:

    Click image for larger version

Name:	Stata Screenshot 2024-12-19 151954.png
Views:	1
Size:	121.9 KB
ID:	1769771



    Here's the one with the column wiped out -- the standard errors are the same so the test statistic is reliable.

    Click image for larger version

Name:	Stata Screenshot 2024-12-19 152053.png
Views:	1
Size:	122.6 KB
ID:	1769772


    You can still encounter problems -- see

    Code:
    svy , subpop( if !smsa1 ) : tab diabetes smsa, se count format(%12.0f)
    svy , subpop( if !smsa1 ) : tab diabetes smsa if !smsa1, se count format(%12.0f)
    which actually interacts badly with the sampling design, as some of the strata have entirely metro or entirely non-metro PSUs. So check also that the degrees of freedom are fine, etc.
    -- Stas Kolenikov || http://stas.kolenikov.name
    -- Principal Survey Scientist, Abt SRBI
    -- Opinions stated in this post are mine only

    Comment


    • #3
      Thanks very much Stas - I am testing it too using CIs but this looks like it solves the issue. Still not sure why Stata includes the all zero column.

      Comment

      Working...
      X