Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bsample with strata of just one cluster

    Hi, I am using the bsample command and struggling with the error "singleton cluster detected".

    The following code illustrates the issue.
    Code:
    . sysuse auto, replace
    (1978 automobile data)
    
    . bsample, strata(headroom) cluster(make)
    singleton cluster detected
    r(460);
    
    end of do-file
    
    r(460);
    I think, the problem is that, as you can see in the following result, a stratum 5.0 includes just one cluster.
    That is, Stata has only one option to draw within the stratum 5.0.
    Code:
    . tab headroom
    
       Headroom |
          (in.) |      Freq.     Percent        Cum.
    ------------+-----------------------------------
            1.5 |          4        5.41        5.41
            2.0 |         13       17.57       22.97
            2.5 |         14       18.92       41.89
            3.0 |         13       17.57       59.46
            3.5 |         15       20.27       79.73
            4.0 |         10       13.51       93.24
            4.5 |          4        5.41       98.65
            5.0 |          1        1.35      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    What I wanted to do is just to draw the only cluster if a stratum contains just one cluster.
    But, I am not sure how to let Stata do this.

    How to solve this?

  • #2
    Yes, it seems that -bsample- does not permit strata with singleton clusters. I could not find any mention of this restriction in a quick read of the help file and the manual section. All that is said there is that the number of clusters requested by the expression that follows -bsample- not exceed the number of clusters. But your command has no expression there, so, by default, the number of clusters wanted equals the number of clusters and I find nothing said that suggests a singleton is not allowable. So this may be a bug. (On the other hand, running -bsample- with trace on reveals that the code explicitly tests for singleton clusters within strata and terminates with error if any are found--so it seems to be the intended behavior of the author of the code. Maybe it's a problem with the documentation rather than a bug in the code.)

    Anyway, to do what you want, you have to write some code that emulates what -bsample- would do if this were allowed:
    Code:
    clear*
    
    sysuse auto, clear
    
    levelsof headroom, local(strata)
    local n_strata: word count `strata'
    
    by headroom (make), sort: gen cluster_id = sum(make != make[_n-1])
    by headroom (make): gen n_clusters = cluster_id[_N]
    
    tempfile holding
    save `holding'
    
    set seed 1234 // OR YOUR PREFERRED SEED
    
    keep headroom n_clusters
    by headroom, sort: keep if _n <= n_clusters
    
    gen cluster_id = runiformint(1, n_clusters)
    drop n_clusters
    
    joinby headroom cluster_id using `holding'
    Note: If you are going to write a loop that iterates the sampling with replacement, only the part from -keep headroom n_clusters- onward needs to be in the loop. The code that precedes that is one-time setup.

    Finally, although I can see no statistical reason why bootstrap sampling with a stratum having a singleton cluster should be illegitimate, my understanding of the bootstrap method is incomplete, and there may be some reason why the standard errors calculated using bootstrap in this circumstance are not valid. I raise this point because usually when Stata won't let you do something, there is a good reason why it shouldn't be done.

    Comment


    • #3
      Clyde Schechter Thank you so much. Your suggestion is very intuitive and powerful. For your information, I just tried to mimic the exact sampling procedure of a survey data set when I calculate the bootstrap standard errors rather than using the common nonparametric clustered bootstrap. But, in my final data set, there are some strata (regions) including only one school (cluster). As you said, there may be some reasons why Stata won't let me do this work.
      Last edited by Minch Park; 22 Dec 2024, 23:26.

      Comment

      Working...
      X