Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simulate/bsample stop working after 21 reps (out of 1000 expected)

    Hi,

    Just thought I'd try to send this on here, maybe anyone has suggestions on how to solve this. I have encountered a problem, I created a program to simulate bootstrapped samples from a mixed regression and some healthcare costs, to get one of those cost-effectiveness planes with the bootstrapped "cloud". However, the simulation works only for the first 21 repetitions.

    I have searched a lot for any potential similar posts (here and google etc), but I think if there are any these stay "hidden" cause I found no great combination of words to use in the search, end up with a lot of other stuff.

    Code:
    preserve
    capture program drop twomeans
    program define twomeans, rclass
    bsample
       qui mixed EQ5Dscore_ time##i.Intervention || lopnr: time, cov(un)
       margins Intervention, at(time=(0 1 2 6 12 24)) vsquish predict() post
        matrix list e(b)
        matrix alpha = e(b)
        return scalar G1T0=alpha[1,1]
        return scalar G1T1=alpha[1,3]
        return scalar G1T2=alpha[1,5]
        return scalar G1T6=alpha[1,7]
        return scalar G1T12=alpha[1,9]
        return scalar G1T24=alpha[1,11]
        return scalar G2T0=alpha[1,2]
        return scalar G2T1=alpha[1,4]
        return scalar G2T2=alpha[1,6]
        return scalar G2T6=alpha[1,8]
        return scalar G2T12=alpha[1,10]
        return scalar G2T24=alpha[1,12]
        qui regress totalcost i.Intervention if time==1
        matrix list e(b)
        matrix beta = e(b)
        return scalar G2=beta[1,2]
        return scalar constant=beta[1,3]
        exit
    end
    
    simulate diff=r(G2) constant=r(constant) G1T0=r(G1T0) G1T1=r(G1T1) G1T2=r(G1T2) G1T6=r(G1T6) G1T12=r(G1T12) G1T24=r(G1T24) G2T0=r(G2T0) G2T1=r(G2T1) G2T2=r(G2T2) G2T6=r(G2T6) G2T12=r(G2T12) G2T24=r(G2T24), reps(1000) seed(5678) saving(Manuscript - Adherence PAK\Analyses\Temp\cloud_all_uk.dta, replace): twomeans
    As you can see from the code, I try to do all the bootstrap simulations in one go, both costs (variable: totalcost) and health-related quality of life (HRQoL, variable: EQ5Dscore_). 'totalcost' is what it sounds like, a total cost, the same in all rows so to ensure I do not use multiple rows for each person in the dataset I have selected to only use the totalcost for the row indicating time==1. However, this is panel data, so HRQoL was measured at multiple timepoints for each patient, and therefore I want to use a longitudinal mixed model for the QALY-estimation.

    From that, the simulations start running (very slowly, as it should), but after the first 21 repetitions it speeds up and just give me those red x's indication failed repetition. This problem has occurred both when doing only 30 repetitions and when doing the full set of 1000 repetitions:

    Simulations (1,000)
    ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
    .....................xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 50
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 100
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 150
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 200
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 250
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 300
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 350
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 400
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 450
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 500
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 550
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 600
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 650
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 700
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 750
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 800
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 850
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 900
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 950
    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1,000


    A colleague who tried to use a simular code (we developed it together, so very similar code but completely other dataset) had the exact same issue.

    Have anyone tried to do something similar or had a similar problem turn up with simulations/bsample? Can you see from what I post here already what is the issue, maybe?

    Of course also super thankful for any other comments/suggestions/guidance to improve the code or my set up for this analysis.

    Kind regards,
    Hanna
    Last edited by Hanna Gyllensten; 21 Dec 2021, 08:00.

  • #2
    Have you tried adding the noisily option to the simulate command to see if anything informative is output in the 22nd repetition and later?

    Comment


    • #3
      Very good idea, thanks, had not thought of that for simulate (often use it in "normal" bootstrap if there are problems). It says "initial values not feasible
      captured error running (twomeans), posting missing values". I guess its the mixed regression that stops it, cause nothing is posted for that "round" (or I assume it would have posted the mixed-results but not the cost-stuff?). I've seen other posts with the same error reported after simulate (other regressions but still), with responses suggesting to start with a less complicated model but these models work just fine outside of the simulation and there is really no covariates in them except the intervention group and time variable (which is necessary to handle that its panel data), and solutions to enable slope and intercept to vary, like a really basic mixed-regression.

      The dataset is, if its useful to know some more about it, patient-level data from a randomized controlled trial where patients were either receiving person-centred care or "care as usual", and we followed costs using administrative registers (so no missing data on costs) and they responded to the Euroqol/EQ-5D health state questionnaire at the stated time points (0, 1, 2, 6, 12, and 24 months). However, the self-reported data, as often is the case, does have missing data issues. We are assuming its "fairly" random, in particular since for this specific analysis we already excluded the patients who died during the study period (small percentage in both groups, and we need a complete follow-up period here due to looking also into refill adherence for some specific medications). I have previously used multiple imputation for this type of analysis but a couple of years ago there was a paper suggesting to use these longitudinal methods for analysing EQ-5D data instead, which appears to work very well on its own, but here is where I get stuck as soon as I try to combine it with anything to do with costs. The problems with MI was mainly that it takes a lot of time to run, really, a lot. Don't know if its related to our data all being in a central server, but last I tried my computer closed down after three days due to some mandatory maintenance/update, and at that time it had still not run all the bootstraps and consecutive imputations.

      Any suggestions on improvements, other solutions or reading to get past this issue would be highly appreciated. I just find it so strange that it works fine first, and then suddenly goes crazy and only gives the same error round after round.

      Kind regards,
      Hanna

      Comment


      • #4
        You may have already seen this but, it seems to me that you need a "preserve/restore" within the loop, otherwise, your sample is just deteriorating:

        Code:
        You currently have this:
        program example
        bsample
        reg y x
        end
        simulate, reps(100): example
        
        However, I think it should be like this:
        
        program example2
        preserve
        bsample
        reg y x
        restore
        end
        simulate, reps(100): example

        Comment


        • #5
          In trying to diagnose a problem with a bootstrap or similar procedure, I always use -set seed SomeNumber- before the procedure so that I can reproduce the error and thereby better track it. Trying different seeds offers a relatively simplistic way to see if the problem depends on a random data configuration, rather than something systematic about the program. Such a use of -set seed- might already have been tried here, but in case not, I'll offer the suggestion.

          Comment


          • #6
            Thanks for all suggestions, I used them all!! In the end, the solution was very much in line with the suggestion by FernandoRios above, so the code that worked fine is now:
            Code:
            capture program drop twomeans
            program define twomeans, rclass
            preserve
            bsample
               qui mixed EQ5Dscore_ time##i.Intervention || lopnr: time, cov(un)
               margins Intervention, at(time=(0 1 2 6 12 24)) vsquish predict() post
                matrix list e(b)
                matrix alpha = e(b)
                return scalar G1T0=alpha[1,1]
                return scalar G1T1=alpha[1,3]
                return scalar G1T2=alpha[1,5]
                return scalar G1T6=alpha[1,7]
                return scalar G1T12=alpha[1,9]
                return scalar G1T24=alpha[1,11]
                return scalar G2T0=alpha[1,2]
                return scalar G2T1=alpha[1,4]
                return scalar G2T2=alpha[1,6]
                return scalar G2T6=alpha[1,8]
                return scalar G2T12=alpha[1,10]
                return scalar G2T24=alpha[1,12]
                qui regress totalcost i.Intervention if time==0
                matrix list e(b)
                matrix beta = e(b)
                return scalar G2=beta[1,2]
                return scalar constant=beta[1,3]
            restore
            end
            
            simulate diff=r(G2) constant=r(constant) G1T0=r(G1T0) G1T1=r(G1T1) G1T2=r(G1T2) G1T6=r(G1T6) G1T12=r(G1T12) G1T24=r(G1T24) G2T0=r(G2T0) G2T1=r(G2T1) G2T2=r(G2T2) G2T6=r(G2T6) G2T12=r(G2T12) G2T24=r(G2T24), reps(1000) seed(5678) saving(Analyses\cloud_all_uk.dta, replace): twomeans
            We have now tried this code out in several datasets and it seems to work fine. Thanks again all for your help!

            For those reading afterwards and being curious about the data and what I was trying to achieve: I use a dataset in long format with multiple observations of health-related quality of life (EQ-5D) per individual. Id-variable is 'lopnr'. Costs are a clump sum, and I now put that in one row per individual (at time==0). My aim is to in the end be able to create the cost-effectiveness plane "cloud graph", with one thousand bootstrapped simulated samples, as a sensitivity analysis comparing our study intervention to "usual care". So to create that graph I continue working from the file that is created in the simulate-code.

            Comment

            Working...
            X