Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • bootstrapping with multiple imputation on components of derived variables

    Hi,

    I am working with data on cost and quality of life to do a cost effectiveness analysis. I have some missing values I need to impute to then derive QALYs from qol and life years. I have the following code:

    Code:
    mi set wide
    mi stset, clear
    mi register imputed qol1 qol2 qol3 qol4 qol5 cost1 cost2 cost3 cost4 cost5
    mi register regular ID sex treatment age qol0 survival_time censored date0 date1 date2 date3 date4 date5 ///
       event0 event1 event2 event3 event4 event5 flag obs_to_drop mean_time_exp mean_time_exp_years ///
       mean_time_logl mean_time_logl_years mean_time_logn mean_time_logn_years med_time_gom med_time_gom_years ///
       mean_time_wei mean_time_wei_years
    
    // impute the missing data
    mi impute chained (pmm,knn(3)) qol1 qol2 qol3 qol4 qol5 cost1 cost2 cost3 cost4 cost5 = sex age qol0 ///
       mean_time_logn_years, by(treatment) replace add(5)
    
    // calculate qalys for each imputation
    forval y=1/5 {
        local _`y'_qol1 "((qol0 + _`y'_qol1)/2 * (date1-date0)/365.25)"
    *    local _`y'_qol1_d "((0 + _`y'_qol1)/2 * (date1-date0)/365.25)"
        local _`y'_qol2 "((_`y'_qol1 + _`y'_qol2)/2 * (date2-date1)/365.25)"
        local _`y'_qol2_d "((_`y'_qol1 + 0)/2 * (date2-date1)/365.25)"
        local _`y'_qol3 "((_`y'_qol2 + _`y'_qol3)/2 * (date3-date2)/365.25)"
        local _`y'_qol3_d "((_`y'_qol2 + 0)/2 * (date3-date2)/365.25)"
        local _`y'_qol4 "((_`y'_qol3 + _`y'_qol4)/2 * (date4-date3)/365.25)"
        local _`y'_qol4_d "((_`y'_qol3 + 0)/2 * (date4-date3)/365.25)"
        local _`y'_qol5 "((_`y'_qol4 + _`y'_qol5)/2 * (date5-date4)/365.25)"
        local _`y'_qol5_d "((_`y'_qol4  + 0)/2 * (date5-date4)/365.25)"
        local _`y'_qol6 "((_`y'_qol5 + 0)/2 * ((mean_time_logn+date0)-date5)/365.25)"
        
        gen _`y'_qaly=.
        // nobody dead at 1.
        replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2_d' if event1==0 & event2==1 // dead at 2
        replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2'+ `_`y'_qol3_d' if event2==0 & event3==1 // dead at 3
        replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2' + `_`y'_qol3' + `_`y'_qol4_d' if event3==0 & event4==1 // dead at 4
        replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2' + `_`y'_qol3' + `_`y'_qol4' + `_`y'_qol5_d' if event4==0 & event5==1 // dead at 5
        replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2' + `_`y'_qol3' + `_`y'_qol4' + `_`y'_qol5' + `_`y'_qol6' if event5==0 // censored
    }
    // gen total costs
    forval y=1/5 {
        egen _`y'_totcost = rowtotal(_`y'_cost1 _`y'_cost2 _`y'_cost3 _`y'_cost4 _`y'_cost5)  
    }
    I then need to bootstrap the coefficients from an sureg and thought about the following:

    Code:
    program define myboot, rclass
        mi set wide
    *    mi stset, clear
        mi register passive qaly totcost
        mi register imputed qol1 qol2 qol3 qol4 qol5 cost1 cost2 cost3 cost4 cost5
        mi register regular ID sex treatment age qol0 survival_time censored date0 date1 date2 date3 date4 date5 ///
            event0 event1 event2 event3 event4 event5 flag obs_to_drop mean_time_exp mean_time_exp_years ///
            mean_time_logl mean_time_logl_years mean_time_logn mean_time_logn_years med_time_gom med_time_gom_years ///
            mean_time_wei mean_time_wei_years
    
        mi estimate, cmdok: sureg (tot_cost age i.sex i.treatment) (qaly age i.sex i.treatment)  
        return scalar b_q01 = el(e(b_mi),1,1)  // still need to specify the exact elements correctly in these few lines...
        return scalar b_q02 = el(e(b_mi),1,1)
        return scalar b_c01 = el(e(b_mi),1,1)
        return scalar b_c02 = el(e(b_mi),1,1)
    end
    * Save bootstrapped coefficients
    bootstrap b_qaly01=r(b_q01) b_qaly02=r(b_q02) b_cost01=r(b_c01) b_cost02=r(b_q02), reps(1000) saving("$working\bootstrap_results.dta", replace): myboot
    The problem seems to be that in the mi register passive command, the variable qaly is not found (even though I have 5 variables named _1_qaly _2_qaly _3_qaly _4_qaly _5_qaly in the data according to each imputation). I am not sure what has gone wrong here as I would have thought Stata would pick up the imputations by the variable names.

    Grateful for any help with this.
    Last edited by Jane Fry; 27 Feb 2024, 07:50.

  • #2
    I am a bit confused by the code since you somehow use parts of imputation before and then also inside the program. I am not sure if this works as you intend. I have suggested two general approaches to bootstrapping with imputation in Stata. I hope this gives you an example of how you can structure your code. https://www.preprints.org/manuscript/202401.0813/v1
    Best wishes

    (Stata 16.1 MP)

    Comment


    • #3
      Felix Bittmann Thank you. Your paper was my original source for this work! Kind regards, Jane.

      Comment


      • #4
        I see that you are then familiar with my approach, great! However, I am not sure if the current code is BootImpute or ImputeBoot. In the first part you apparently impute the data first and then apply the bootstrapping later, so this would be the ImputeBoot strategy. However, this would require you to take a bootstrap resample using bsample, which I do not find in your program (see page 6 in my PDF). You should not impute first and then simply use the bootstrap prefix as this command does not "know" about the imputed data structure. You also use the wide imputation format instead of flong as I suggest. Overall I am therefore not sure if this approach will give you valid results, even if the code runs. Potentially, in your current program, a simple bug might cause the error. Especially the part where you generate many locals is a bit convoluted and could easily be responsible for the error. Without having the data, checking this is not really possible. What you can do is run the code noisily or with "trace" to see what happens and why a variable is not found.

        EDIT: I have thought a bit more about imputing in wide and I think your approach is sound. I have not discussed this in the paper since it is usually not feasible with many variables and imputations but it should work. The error is then another issue, I bet you find it when tracing the program.
        Last edited by Felix Bittmann; 28 Feb 2024, 04:20.
        Best wishes

        (Stata 16.1 MP)

        Comment


        • #5
          Felix Bittmann
          The code was meant to be boot-impute. My complication was that I have to impute quality of life (and costs, but don't worry about that for now) from which I then calculate QALY (=qol * life years, where life years comes from survival analysis). QALY is then used in an SUR model (of cost and QALY) that needs bootstrapping to estimate a set of coefficients for the treatment effect on costs and QALYs. So I figure I have (in the above) 5 imputation variables for qol from which I calculate 5 (effectively imputed) QALY variables. Is this not right? I had thought about


          Code:
          program define myboot, rclass
                    local cost = 0
                    local qaly = 0
                    forval x=1/5 {    // the calculated values based on each imputation
                             sureg (_`x'_tot_cost age i.sex i.treatment) (_`x'_qaly age i.sex i.treatment)
                             local cost = `cost' + _b[_`x'_tot_cost:1.treatment]
                             local qaly = `qaly' + _b[_`x'_qaly:1.treatment]
                    }
                    return scalar cost = `cost'/5
                    return scalar qaly = `qaly'/5   // (i.e. average coefficient over the imputations)
          end
          
          * Save 1000 bootstrapped coefficients for costs and qalys
          bootstrap result_cost=r(cost) result_qaly=r(qaly), reps(1000) saving("$working\bootstrap_results.dta", replace): myboot
          Would that work?
          Last edited by Jane Fry; 05 Mar 2024, 06:27.

          Comment


          • #6
            If you want to go BootImpute, you need to write a program that does it all: impute the dataset, compute the results of interest and return them. Your current program (post #5) does not impute at all. I would suggest to first write a program that does exactly what you want and returns your stats of interest. If this works fine, then you can add the imputation part. If this also works fine, then you can run it with the bootstrap prefix. I cannot really test this since I do not have the data but based on your initial posts, your program could look something like this:


            Code:
            cap program drop myprog
            program define myprog, rclass
            mi set wide
            mi stset, clear
            mi register imputed qol1 qol2 qol3 qol4 qol5 cost1 cost2 cost3 cost4 cost5
            mi register regular ID sex treatment age qol0 survival_time censored date0 date1 date2 date3 date4 date5 ///
               event0 event1 event2 event3 event4 event5 flag obs_to_drop mean_time_exp mean_time_exp_years ///
               mean_time_logl mean_time_logl_years mean_time_logn mean_time_logn_years med_time_gom med_time_gom_years ///
               mean_time_wei mean_time_wei_years
            
            // impute the missing data
            mi impute chained (pmm,knn(3)) qol1 qol2 qol3 qol4 qol5 cost1 cost2 cost3 cost4 cost5 = sex age qol0 ///
               mean_time_logn_years, by(treatment) replace add(5)
            
            // calculate qalys for each imputation
            forval y=1/5 {
                local _`y'_qol1 "((qol0 + _`y'_qol1)/2 * (date1-date0)/365.25)"
            *    local _`y'_qol1_d "((0 + _`y'_qol1)/2 * (date1-date0)/365.25)"
                local _`y'_qol2 "((_`y'_qol1 + _`y'_qol2)/2 * (date2-date1)/365.25)"
                local _`y'_qol2_d "((_`y'_qol1 + 0)/2 * (date2-date1)/365.25)"
                local _`y'_qol3 "((_`y'_qol2 + _`y'_qol3)/2 * (date3-date2)/365.25)"
                local _`y'_qol3_d "((_`y'_qol2 + 0)/2 * (date3-date2)/365.25)"
                local _`y'_qol4 "((_`y'_qol3 + _`y'_qol4)/2 * (date4-date3)/365.25)"
                local _`y'_qol4_d "((_`y'_qol3 + 0)/2 * (date4-date3)/365.25)"
                local _`y'_qol5 "((_`y'_qol4 + _`y'_qol5)/2 * (date5-date4)/365.25)"
                local _`y'_qol5_d "((_`y'_qol4  + 0)/2 * (date5-date4)/365.25)"
                local _`y'_qol6 "((_`y'_qol5 + 0)/2 * ((mean_time_logn+date0)-date5)/365.25)"
                
                gen _`y'_qaly=.
                // nobody dead at 1.
                replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2_d' if event1==0 & event2==1 // dead at 2
                replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2'+ `_`y'_qol3_d' if event2==0 & event3==1 // dead at 3
                replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2' + `_`y'_qol3' + `_`y'_qol4_d' if event3==0 & event4==1 // dead at 4
                replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2' + `_`y'_qol3' + `_`y'_qol4' + `_`y'_qol5_d' if event4==0 & event5==1 // dead at 5
                replace _`y'_qaly = `_`y'_qol1' + `_`y'_qol2' + `_`y'_qol3' + `_`y'_qol4' + `_`y'_qol5' + `_`y'_qol6' if event5==0 // censored
            }
            // gen total costs
            forval y=1/5 {
                egen _`y'_totcost = rowtotal(_`y'_cost1 _`y'_cost2 _`y'_cost3 _`y'_cost4 _`y'_cost5)  
            }
            
            mi estimate, cmdok: sureg (tot_cost age i.sex i.treatment) (qaly age i.sex i.treatment)  
                return scalar b_q01 = el(e(b_mi),1,1)  // still need to specify the exact elements correctly in these few lines...
                return scalar b_q02 = el(e(b_mi),1,1)
                return scalar b_c01 = el(e(b_mi),1,1)
                return scalar b_c02 = el(e(b_mi),1,1)
            end
            
            *** Testing the program ***
            myprog
            return list
            You can use wide imputation here if you want and your code will a lot depend on this decision. No matter the formatting, the program must return the stats of interest. If and only if this works fine you can continue with the bootstrapping.
            Best wishes

            (Stata 16.1 MP)

            Comment


            • #7
              Thanks. I have gone with wide for now and got it running. I was just wondering about the impute command -- I think chained means it uses between individual variation to impute missing values (so, for example all values of the regular variables to impute, say, qol4). I think not using chained would do within individual variation (regular variables for an individual observation to impute that individual's qol4. Would that be a correct interpretation?

              Kind regards.

              Comment

              Working...
              X