Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Survey analysis: Error with single stratum

    Hi everyone,

    I was wondering how I could go about debugging the error message that I have received during my analysis. I have tried reading through the forum postings and also through the stata manual but I am having a mental block. What is the meaning of the error "missing standard error because of stratum with single sampling unit" and how do I solve the issue? This is my first foray in survey analysis, so I greatly appreciate the community's patience and guidance in this.

    I have copied and pasted two examples of my codes and the accompanying error messages below.

    The database that I am using is the National Readmissions Database. The variables used for the survey analysis are the following: hosp_nrd (the hospital at which the patient is admitted and discharged from), discwt (the weighting variable), nrd_stratum (stratification variable).


    Code:
    svyset hosp_nrd [pw=discwt], strata(nrd_stratum)
    Code:
    . svy: mean totchg_readmit if thirty_days_readmit==1
    (running mean on estimation sample)
     
    Survey: Mean estimation
     
    Number of strata =      81        Number of obs   =      2,096
    Number of PSUs   =     781        Population size = 4,852.0524
                                      Design df       =        700
     
    ----------------------------------------------------------------
                   |             Linearized
                   |       Mean   Std. Err.     [95% Conf. Interval]
    ---------------+------------------------------------------------
    totchg_readmit |   43062.96          .             .           .
    ----------------------------------------------------------------
    Note: Missing standard error because of stratum with single
          sampling unit.
    For this code, I am trying to look at the percentages of post-op complications in IBD patients who are obese versus those who are not obese. I run into an error where the stratum only has a single sampling unit.

    Code:
    . svy if surg_ibd==1 & elective==0: tab post_op_comp obese, col percent se
    (running tabulate on estimation sample)
     
    Number of strata   =        85                  Number of obs     =      6,219
    Number of PSUs     =     1,137                  Population size   = 14,526.108
                                                    Design df         =      1,052
     
    -------------------------------
    post_op_c |        obese      
    omp       |     0      1  Total
    ----------+--------------------
            0 | 93.26  93.22  93.26
            1 | .4224  .6748  .4395
            2 | .5327  .8139  .5517
            3 | .0465  .5594  .0813
            4 | .1935      0  .1804
            5 | 4.499  3.375  4.423
            6 | .4165  .2355  .4042
            7 | .4065  .5594  .4169
            8 | .2247  .5594  .2474
              |
        Total |   100    100    100
    -------------------------------
      Key:  column percentage
     
      Pearson:
        Uncorrected   chi2(8)         =   18.1021
        Design-based  F(., .)         =         .     P =      .
     
    Note: Missing test statistics because of stratum with single sampling unit.
    Note: Missing standard errors because of stratum with single sampling unit.
    My question is how to continue doing analyses on variables where the stratum only has a single sampling unit?

    Thank you in advance for everyone's time!
    Last edited by Nghia Nguyen; 20 Feb 2017, 21:26.

  • #2
    You cannot use the if syntax with svyset data. You have to use the subpop option in the svyset prefix.

    For some reason I can't remember, svyset data must have multiple observations per primary sampling unit to calculate a standard error. And, for whatever reason, imposing an if condition will mess things up, whereas things will calculate correctly with the subpop option.

    more info:
    http://www.cpc.unc.edu/research/tool...s/svy_commands
    Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

    When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

    Comment


    • #3
      The error message you are getting means exactly what it says. You have some stratum (possibly more than one) that contains only a single PSU. The formulas used to calculate standard errors have fractions with #PSUs-1 in the denominator, so they fail when there is a stratum with only one PSU. To fix it, you need to modify your data. So the first step is to run -svydes-. That will give you a listing of all of the strata and how many PSU's (called Units in the output) each contains. Scan that output to identify any strata that only contain 1 unit. You then need to re-assign that singleton stratum (or those singleton strata if there is more than one affect stratum) to some other stratum.

      The choice of which other stratum(a) to assign the singleton(s) to depends on how the strata were originally defined. You should choose as the new stratum for your singleton one whose PSU's have as similar stratum-defining characteristics to the stratum the singleton ended up in as possible. So, for example, if your strata were defined by population count, you would prefer to assign a singleton large city to a stratum with other large cities, and not to one that contains small towns.

      That all said, Weiwen is right that you should not use the -if- condition to do survey data analysis on subsets of your data. You should use the -subpop()- option in the svy prefix. Use of the -if- condition will give you incorrect answers.

      Comment


      • #4
        Hi Weiwen and Clyde,

        Thank you so much for your helpful suggestions! I used the -subpop()- function and that solved the issue!

        Best,
        Nghia

        Comment


        • #5
          I was having the same problem, and the svydes command that Clyde suggested is indeed the way to go for finding the singleton strata. Once I detected the singletons, I reassigned them. Running the svydes command now shows that there are no singletons in my dataset.

          However, when I try to run a regression, Stata is still not able to calculate the SEs, and I'm still getting that message at the bottom: Missing standard error because of stratum with single sampling unit.

          What else could be causing this?

          Comment


          • #6
            In case someone else is having this issue, I found a convincing answer elsewhere:

            Missing data can cause entire sampling units to be dropped from the analysis, possibly leaving a single sampling unit in the estimation sample.

            In other words, remember that regress is only working with observations that have no missing values. Dropped observations may cause others to become singletons, even if svydes reports no singletons prior to running the regression.

            Comment


            • #7
              Originally posted by Clyde Schechter View Post
              The error message you are getting means exactly what it says. You have some stratum (possibly more than one) that contains only a single PSU. The formulas used to calculate standard errors have fractions with #PSUs-1 in the denominator, so they fail when there is a stratum with only one PSU. To fix it, you need to modify your data. So the first step is to run -svydes-. That will give you a listing of all of the strata and how many PSU's (called Units in the output) each contains. Scan that output to identify any strata that only contain 1 unit. You then need to re-assign that singleton stratum (or those singleton strata if there is more than one affect stratum) to some other stratum.

              The choice of which other stratum(a) to assign the singleton(s) to depends on how the strata were originally defined. You should choose as the new stratum for your singleton one whose PSU's have as similar stratum-defining characteristics to the stratum the singleton ended up in as possible. So, for example, if your strata were defined by population count, you would prefer to assign a singleton large city to a stratum with other large cities, and not to one that contains small towns.

              That all said, Weiwen is right that you should not use the -if- condition to do survey data analysis on subsets of your data. You should use the -subpop()- option in the svy prefix. Use of the -if- condition will give you incorrect answers.
              Thank you for the very good clarification,

              Comment

              Working...
              X