Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stratifying a skewed variable into simulations

    Hi. I am trying to randomize medical providers into different treatment arms and am working with baseline data that tracks the number of consultations done by week (the outcome/dependent variable).

    1. I want to stratify the outcome variable into quintiles, but this data is highly skewed (shown below). Is there anyway for me to still stratify into bins of equal observations. I tried the xtile command but that didn't work (shown below).
    2. After I stratify, I am struggling to include this stratification variable into the simulation code below.

    Any help on these will be greatly appreciated.



    Code:
                           (max) OCbyWeek
    -------------------------------------------------------------
          Percentiles      Smallest
     1%            0              0
     5%            0              0
    10%            0              0       Obs               1,935
    25%            0              0       Sum of Wgt.       1,935
     
    50%            0                      Mean           2.749871
                            Largest       Std. Dev.      4.435621
    75%            4             27
    90%            9             30       Variance       19.67473
    95%           12             30       Skewness       2.156134
    99%           18             36       Kurtosis       8.907038
    
    
     xtile quintile = OCbyWeek , nq(5)
    
    . tab quintile, sum( OCbyWeek )
    
    5 quantiles |      Summary of (max) OCbyWeek
    of OCbyWeek |        Mean   Std. Dev.       Freq.
    ------------+------------------------------------
              1 |           0           0       1,046
              3 |           1           0         145
              4 |   3.7553444   1.4291193         421
              5 |   11.130031    4.218614         323
    ------------+------------------------------------
          Total |   2.7498708   4.4356212       1,935
    
    
    Simulation code:
    pc_simulate OCbyWeek, model(ANCOVA) mde(0.45) i(Consultantid) t(week_gen) n(95) bootstrap p(0.5) pre(5) post(12) alpha(0.05) nsim(1000) vce(robust) outfile(powercalcs.csv)

  • #2
    There is no good news here that you can't see.

    You need to give something up. You can give up the ideal of equal frequencies and an optimist would point out that even the bin with fewest observations has 145 observations. But the top bin is then not at all homogeneous.

    You might get a slightly better outcome by starting at the other end as explained in Section 4 of https://journals.sagepub.com/doi/pdf...867X1201200413 but more than half the observations will end up in the 0 bin, whatever you do. See also https://journals.sagepub.com/doi/pdf...867X1801800311 particularly the incisive references in Section 1.

    I have no experience or expertise on the program you want to run and so can't comment helpfully on that.

    Comment


    • #3
      Thanks, Nick. That makes sense, as do the papers

      Comment

      Working...
      X