Simulate/bsample stop working after 21 reps (out of 1000 expected)

Hanna Gyllensten

Join Date: Sep 2016

Posts: 19
#1

Simulate/bsample stop working after 21 reps (out of 1000 expected)

21 Dec 2021, 07:56

Hi,

Just thought I'd try to send this on here, maybe anyone has suggestions on how to solve this. I have encountered a problem, I created a program to simulate bootstrapped samples from a mixed regression and some healthcare costs, to get one of those cost-effectiveness planes with the bootstrapped "cloud". However, the simulation works only for the first 21 repetitions.

I have searched a lot for any potential similar posts (here and google etc), but I think if there are any these stay "hidden" cause I found no great combination of words to use in the search, end up with a lot of other stuff.

Code:

preserve capture program drop twomeans program define twomeans, rclass bsample qui mixed EQ5Dscore_ time##i.Intervention || lopnr: time, cov(un) margins Intervention, at(time=(0 1 2 6 12 24)) vsquish predict() post matrix list e(b) matrix alpha = e(b) return scalar G1T0=alpha[1,1] return scalar G1T1=alpha[1,3] return scalar G1T2=alpha[1,5] return scalar G1T6=alpha[1,7] return scalar G1T12=alpha[1,9] return scalar G1T24=alpha[1,11] return scalar G2T0=alpha[1,2] return scalar G2T1=alpha[1,4] return scalar G2T2=alpha[1,6] return scalar G2T6=alpha[1,8] return scalar G2T12=alpha[1,10] return scalar G2T24=alpha[1,12] qui regress totalcost i.Intervention if time==1 matrix list e(b) matrix beta = e(b) return scalar G2=beta[1,2] return scalar constant=beta[1,3] exit end simulate diff=r(G2) constant=r(constant) G1T0=r(G1T0) G1T1=r(G1T1) G1T2=r(G1T2) G1T6=r(G1T6) G1T12=r(G1T12) G1T24=r(G1T24) G2T0=r(G2T0) G2T1=r(G2T1) G2T2=r(G2T2) G2T6=r(G2T6) G2T12=r(G2T12) G2T24=r(G2T24), reps(1000) seed(5678) saving(Manuscript - Adherence PAK\Analyses\Temp\cloud_all_uk.dta, replace): twomeans

As you can see from the code, I try to do all the bootstrap simulations in one go, both costs (variable: totalcost) and health-related quality of life (HRQoL, variable: EQ5Dscore_). 'totalcost' is what it sounds like, a total cost, the same in all rows so to ensure I do not use multiple rows for each person in the dataset I have selected to only use the totalcost for the row indicating time==1. However, this is panel data, so HRQoL was measured at multiple timepoints for each patient, and therefore I want to use a longitudinal mixed model for the QALY-estimation.

From that, the simulations start running (very slowly, as it should), but after the first 21 repetitions it speeds up and just give me those red x's indication failed repetition. This problem has occurred both when doing only 30 repetitions and when doing the full set of 1000 repetitions:

Simulations (1,000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.....................xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 50
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 100
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 150
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 200
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 250
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 300
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 350
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 400
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 450
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 500
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 550
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 600
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 650
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 700
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 750
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 800
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 850
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 900
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 950
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 1,000

A colleague who tried to use a simular code (we developed it together, so very similar code but completely other dataset) had the exact same issue.

Have anyone tried to do something similar or had a similar problem turn up with simulations/bsample? Can you see from what I post here already what is the issue, maybe?

Of course also super thankful for any other comments/suggestions/guidance to improve the code or my set up for this analysis.

Kind regards,
Hanna

Last edited by Hanna Gyllensten; 21 Dec 2021, 08:00.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

21 Dec 2021, 11:57

Have you tried adding the noisily option to the simulate command to see if anything informative is output in the 22nd repetition and later?
Comment
Hanna Gyllensten

Join Date: Sep 2016

Posts: 19
#3

21 Dec 2021, 13:17

Very good idea, thanks, had not thought of that for simulate (often use it in "normal" bootstrap if there are problems). It says "initial values not feasible
captured error running (twomeans), posting missing values". I guess its the mixed regression that stops it, cause nothing is posted for that "round" (or I assume it would have posted the mixed-results but not the cost-stuff?). I've seen other posts with the same error reported after simulate (other regressions but still), with responses suggesting to start with a less complicated model but these models work just fine outside of the simulation and there is really no covariates in them except the intervention group and time variable (which is necessary to handle that its panel data), and solutions to enable slope and intercept to vary, like a really basic mixed-regression.

The dataset is, if its useful to know some more about it, patient-level data from a randomized controlled trial where patients were either receiving person-centred care or "care as usual", and we followed costs using administrative registers (so no missing data on costs) and they responded to the Euroqol/EQ-5D health state questionnaire at the stated time points (0, 1, 2, 6, 12, and 24 months). However, the self-reported data, as often is the case, does have missing data issues. We are assuming its "fairly" random, in particular since for this specific analysis we already excluded the patients who died during the study period (small percentage in both groups, and we need a complete follow-up period here due to looking also into refill adherence for some specific medications). I have previously used multiple imputation for this type of analysis but a couple of years ago there was a paper suggesting to use these longitudinal methods for analysing EQ-5D data instead, which appears to work very well on its own, but here is where I get stuck as soon as I try to combine it with anything to do with costs. The problems with MI was mainly that it takes a lot of time to run, really, a lot. Don't know if its related to our data all being in a central server, but last I tried my computer closed down after three days due to some mandatory maintenance/update, and at that time it had still not run all the bootstraps and consecutive imputations.

Any suggestions on improvements, other solutions or reading to get past this issue would be highly appreciated. I just find it so strange that it works fine first, and then suddenly goes crazy and only gives the same error round after round.

Kind regards,
Hanna
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2411
#4

21 Dec 2021, 13:33

You may have already seen this but, it seems to me that you need a "preserve/restore" within the loop, otherwise, your sample is just deteriorating:

Code:

You currently have this: program example bsample reg y x end simulate, reps(100): example However, I think it should be like this: program example2 preserve bsample reg y x restore end simulate, reps(100): example
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2396
#5

21 Dec 2021, 17:04

In trying to diagnose a problem with a bootstrap or similar procedure, I always use -set seed SomeNumber- before the procedure so that I can reproduce the error and thereby better track it. Trying different seeds offers a relatively simplistic way to see if the problem depends on a random data configuration, rather than something systematic about the program. Such a use of -set seed- might already have been tried here, but in case not, I'll offer the suggestion.
1 like
Comment

Hanna Gyllensten

Join Date: Sep 2016
Posts: 19

17 Mar 2022, 02:43

Thanks for all suggestions, I used them all!! In the end, the solution was very much in line with the suggestion by FernandoRios above, so the code that worked fine is now:

Code:

capture program drop twomeans
program define twomeans, rclass
preserve
bsample
   qui mixed EQ5Dscore_ time##i.Intervention || lopnr: time, cov(un)
   margins Intervention, at(time=(0 1 2 6 12 24)) vsquish predict() post
    matrix list e(b)
    matrix alpha = e(b)
    return scalar G1T0=alpha[1,1]
    return scalar G1T1=alpha[1,3]
    return scalar G1T2=alpha[1,5]
    return scalar G1T6=alpha[1,7]
    return scalar G1T12=alpha[1,9]
    return scalar G1T24=alpha[1,11]
    return scalar G2T0=alpha[1,2]
    return scalar G2T1=alpha[1,4]
    return scalar G2T2=alpha[1,6]
    return scalar G2T6=alpha[1,8]
    return scalar G2T12=alpha[1,10]
    return scalar G2T24=alpha[1,12]
    qui regress totalcost i.Intervention if time==0
    matrix list e(b)
    matrix beta = e(b)
    return scalar G2=beta[1,2]
    return scalar constant=beta[1,3]
restore
end

simulate diff=r(G2) constant=r(constant) G1T0=r(G1T0) G1T1=r(G1T1) G1T2=r(G1T2) G1T6=r(G1T6) G1T12=r(G1T12) G1T24=r(G1T24) G2T0=r(G2T0) G2T1=r(G2T1) G2T2=r(G2T2) G2T6=r(G2T6) G2T12=r(G2T12) G2T24=r(G2T24), reps(1000) seed(5678) saving(Analyses\cloud_all_uk.dta, replace): twomeans

We have now tried this code out in several datasets and it seems to work fine. Thanks again all for your help!

For those reading afterwards and being curious about the data and what I was trying to achieve: I use a dataset in long format with multiple observations of health-related quality of life (EQ-5D) per individual. Id-variable is 'lopnr'. Costs are a clump sum, and I now put that in one row per individual (at time==0). My aim is to in the end be able to create the cost-effectiveness plane "cloud graph", with one thousand bootstrapped simulated samples, as a sensitivity analysis comparing our study intervention to "usual care". So to create that graph I continue working from the file that is created in the simulate-code.

Announcement

Simulate/bsample stop working after 21 reps (out of 1000 expected)

Comment

Comment

Comment

Comment

Comment