Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • AUC response feature

    Dear Forum

    I am having problems choosing the right method for calculating the AUC for my repetitive blood samples in two intervention groups and afterwards testing the difference between the two interventions.


    I have a dataset with 160 records.
    These patients have undergone intervention A or B.
    I have blood sample Results from "admission to hospital", "from reaching their target intervention in the ward", after 6, 12, 18, 24, 30, 36, 42, 48, 72 and 96 hours after "reaching their target intervention in the ward" (of cause a few of the samples are accidentally missing). The blood samples have been drawn up to +/- 2 hours from protocolled time.
    I have my SampleTime as double but I have also calculated timestamps with reference to "reaching their target intervention in the ward" which of cause results in some negative time values.


    I have tried the PK functions (pkexamine and pksumm) but keep on getting errors like:
    -time variables takes on repeated values
    -.....fit() is greater than the number of obs. with __000002==6 & __000001
    insufficient observations


    and I have tried:
    integ Result(interventionA) time
    integ Result(interventionB) time

    twoway (line Result(interventionA) time)(line Result(interventionB) time)(bar ub(interventionA) lb(interventionA) time, black)(bar ub(interventionB) lb(interventionB) time, black), ///
    title("xxxx")

    This solution gives me a negative AUC in interventionA (as I am using the calculated timestamp as explained in the top)


    and I have tried:
    sort record_id time
    generate area=(Result+Result[_n+1])*(tid1[_n+1]-tid1)/2 if record_id==record_id[_n+1]
    collapse (sum) area = area (first) intervention=intervention, by(record_id)
    ttest area, by (intervention)

    The two last ones run fine but I am in doubt if either of these are the right method???

    Could someone help me step by step.
    In advance thx a lot.

    BR
    Anders

  • #2
    Hello Anders,

    Welcome to the Stata Forum. There has been some discussion on a similar topic here:http://www.statalist.org/forums/foru...eated-measures

    Hopefully that helps.

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      Even though you're using a time variable that takes on negative values, the result of -integ- should still come out positive if the variable Result itself is always positive and if the data are sorted correctly. Also, as you have several different patients, you need to do your integration -by- patient id. Try this:

      Code:
      by patient_id time, sort: integ result time, gen(auc)
      You should get correct results for each patient with that, assuming that intervention (A vs B) is a separate variable and that there is a single Result variable used to hold the results for each group. (If you have a ResultA variable and a ResultB variable, one of them always being missing for any patient due to group assignment, then you need to do this separately for each--but that's not a very good layout for data analysis anyway, and I would advise you to -reshape long- instead.)

      Comment


      • #4
        Hi Andres,

        As Clyde suggested, you need integral function to achieve AUC. Please note this concept is different than the discussion thread refferred by Marcos (Reple#2). In fact the terminologies confuse many who are not in this field and no doubt Marcos is just being victimized of confusing jargons. However, I will quickly show how you can achieve this with your effort using 'pkexamine'. Note Clyde's code will do the same except, I show you with trapezoidal rule (reason explained below) while Clyde's one is with cubic spline. But I think Clyde's code needs correction in sorting. I don't think it should be sorted by both id and time. I think it should only be sorted only with id as you essentially want AUC for each id as a fuction of time.

        I am not sure if you have baseline and follow up meassure, but I will elaborate so that you can fit in your purpose.

        Senario: A hormone was meassured between two groups (n=58) Intervention and TAU (Treatment as usual) at baseline (before intervention) and post-intervention at 6-month and 12-months. Hormonal response was measured at half an hour interval at each visit (0m-30m-60m-90m-120m). The hypothetical data looks like below:

        Code:
        //Hypothetical data:
        
         list id time rand base six twelve in 163/198,clean
        
               id   time           rand    base     six   twelve  
        169.   29      0   Intervention   549.5     866    631.5  
        171.   29     30   Intervention     750    1010      882  
        172.   29     60   Intervention    1040    1090      849  
        173.   29     90   Intervention    1200    1030     1040  
        174.   29    120   Intervention    1430    1070      809  
        175.   30      0   Intervention   248.5       .        .  
        177.   30     30   Intervention     341       .        .  
        178.   30     60   Intervention     359       .        .  
        179.   30     90   Intervention     374       .        .  
        180.   30    120   Intervention     433       .        .  
        181.   31      0            TAU     248   370.5      287  
        183.   31     30            TAU     421     655      428  
        184.   31     60            TAU     511     673      499  
        185.   31     90            TAU     616     624      431  
        186.   31    120            TAU     638     754      430  
        
              
        ===========================xx====================
        
        // Calculate AUC with 'pkexamine' and save as variable :
        
        foreach var of varlist base six twelve {
            gen `var'auc=.
                forval i=1/58 {
                    capture pkexamine time `var' if id==`i', t  //trapezoidal rule
                    capture replace `var'auc=(`r(auc)') if id==`i'
             }
        }
        Now the dataset has three variables with total AUC for each participant at each visit. Note 'capture' is used because Stata stops calculation if there is missing values, therefore, 'capture' forces the ongoing calculation. Also please note that the default for AUC calculation is cubic spline whereas this was done with 'trapezodal rule'. The literature in this field suggests if you do not have good number of data points (as here in this case), trapezoidal rule is less biased than cubic spline. You need to check in your field which is preferred. If you don't use 't', cubic spline by default will be used. Now the dataset looks like this with the new variable having total AUC for each patient:

        Code:
               id   time           rand    base    baseauc     six     sixauc   time   twelve   twelveauc  
        169.   29      0   Intervention   549.5   120326.3     866     120450      0    631.5   103781.3  
        171.   29     30   Intervention     750   120326.3    1010     120450     30      882   103781.3  
        172.   29     60   Intervention    1040   120326.3    1090     120450     60      849   103781.3  
        173.   29     90   Intervention    1200   120326.3    1030     120450     90     1040   103781.3  
        174.   29    120   Intervention    1430   120326.3    1070     120450    120      809   103781.3  
        181.   31      0            TAU     248    60202.5   370.5   76151.25      0      287    51247.5  
        183.   31     30            TAU     421    60202.5     655   76151.25     30      428    51247.5  
        184.   31     60            TAU     511    60202.5     673   76151.25     60      499    51247.5  
        185.   31     90            TAU     616    60202.5     624   76151.25     90      431    51247.5  
        186.   31    120            TAU     638    60202.5     754   76151.25    120      430    51247.5
        You can see the overall AUC for each patient is repeated as we only saved the overall AUC for each visit. You basically need one row per id and plot them. See below:

        Code:
        egen pickone = tag(id) //pice one row per id
        drop if pickone!=1
        drop pickone base six twelve time  // we don't need them for plotting
        save test2.dta,replace // save the dataset with AUC only
        
        use test2.dta,clear
        
        ren (baseauc sixauc twelveauc) (auc1 auc6 auc12)
        reshape long auc, i(id) j(time) //reshape in long format for plotting
        
        //Plot them:
        
        lgraph auc time rand, ///
        xlab(1 "Baseline" 6 "6-month" 12 "12-month") ///
        ytitle("Mean AUC") ///
        xtitle("Visit")
        Roman

        Comment


        • #5
          Thank you all for the very warm welcome to this forum. All of your comments have been useful but I have not yet reached my final goal.

          Clyde Schechter : My data are LONG format.
          I have attached an anonymised data example with invented data but the true variables to visualise. (Grp is the intervention)

          However a modified command of the provided code in #2:

          by record_id, sort: integ Result time, gen(auc)

          was succesfull (I left out sorting for time as proposed by Roman Mostazir).
          This time I did´t get any negative AUC´s.

          Could this be modified further to fit the "trapezoidal rule" as proposed by *Roman and could you help me graph the data as the major problem is that the SampleTime/time are not always exactly the same between each record_id (drawing blood samples had to fit the "rounds", so the time can vary for up to 4 hours).
          If we should be very strict the protocolled sampling should have been "Arrival, 6, 12, 18, 24, 30, 36, 42, 48, 72 and 96 hours. In addition some patients unfortunately "leave" the study before all blood samples have been collected.

          Roman Mostazir :
          I tried to reshape to wide format but unfortunately the AUC was only generated as missing.
          I have written down my exact command (inclusive the reshaping)
          ///reshaping
          sort record_id time Result
          by record_id: gen dxno = _n
          drop SampleTime T0
          order record_id dxno time Grp Result
          reshape wide Result time, i(record_id) j(dxno)
          order record_id Grp Result*
          list record_id Grp Result*, clean

          ///pkeamine
          foreach var of varlist Result* {
          gen `var'auc=.
          forvalues i = 1/150 {
          capture pkexamine time `var' if record_id==`i', t
          capture replace `var'auc=(`r(auc)') if record_id==`i'
          }
          }

          The graph provided was exactly what I would like as well as the test for equivalent AUC in the two intervention groups.

          Again thanks a lot for all your help.

          Best regards
          Anders

          Comment


          • #6
            Originally posted by Anders Grejs View Post
            by record_id, sort: integ Result time, gen(auc)

            was succesfull (I left out sorting for time as proposed by Roman Mostazir).
            Correction: you do need 'time' to be sorted but not with the 'by' option. Sort 'time' outside the integral function. See the example below.


            Originally posted by Anders Grejs View Post
            Could this be modified further to fit the "trapezoidal rule" as proposed by *Roman
            Correction !! I did not suggest trapezoidal rule. I just warned you about existing literature which are suggestive of bias of cubic splines or other methods compared to the trapezoidal reule when you have few measurements per patient and showed you the alternative as Clyde already showed you one method. You need to search for literature in your field to find out what is suggestive for you to implement. However, you can just use the 't' option after Clyde's codes to get the trapezoidal rule implemented. Example given.

            Originally posted by Anders Grejs View Post
            If we should be very strict the protocolled sampling should have been "Arrival, 6, 12, 18, 24, 30, 36, 42, 48, 72 and 96 hours. In addition some patients unfortunately "leave" the study before all blood samples have been collected.
            Again you need to do some existing literature search to see how this is handled. One common practice is averaging initial two time points so that everyone starts from the same point. There are other methods as well. Check out your field of literature. On the other issue, if samples are dropped out at a certain meassure, by sorting the time before calculating the AUC, you get the maximum AUC available for that person. Now it is up to the research question, and statistical methods on how you want to handle the missing points to avoid bias in your estimation. Your study protocol should tell you how to handle missing values. But it is not possible to cover up all suggestions here. If you are doing an ITT analysis, comparing two groups is just fine as they are.

            Originally posted by Anders Grejs View Post

            The graph provided was exactly what I would like as well as the test for equivalent AUC in the two intervention groups.
            Example:

            Code:
            Hypothetical data:
            
            list id group time result in 70/85, clean
            
                 id   group   time   result  
             70.   10       1      6      337  
             71.   11       1      0      585  
             72.   11       1      1       81  
             73.   11       1      2       54  
             74.   11       1      3       97  
             75.   11       1      4      284  
             76.   11       1      5      224  
             77.   11       1      6      356  
             78.   12       2      0      474  
             79.   12       2      1      351  
             80.   12       2      2      415  
             81.   12       2      3      146  
             82.   12       2      4      380  
             83.   12       2      5      525  
             84.   12       2      6      196  
             85.   13       2      0      107
            
            ==================xxxxxxxxxx=================
            
            // Calculate AUC applying both rules:
            
            sort id time //sort time outside
            
            by id , sort: integ result time, gen(auc) t //trapezoidal
            by id , sort: integ result time, gen(auc2)  //cubic spline
            
            That looks like:
            
                  id   group   time   result      auc       auc2  
             70.   10       1      6      337   1591.5   1537.167  
             71.   11       1      0      585        0          0  
             72.   11       1      1       81      333   289.4614  
             73.   11       1      2       54    400.5   336.1545  
             74.   11       1      3       97      476   401.6707  
             75.   11       1      4      284    666.5   599.4128  
             76.   11       1      5      224    920.5    860.178  
             77.   11       1      6      356   1210.5   1129.625  
             78.   12       2      0      474        0          0  
             79.   12       2      1      351    412.5   390.5495  
             80.   12       2      2      415    795.5   789.8019  
             81.   12       2      3      146     1076   1063.743  
             82.   12       2      4      380     1339   1294.227  
             83.   12       2      5      525   1791.5   1779.849  
             84.   12       2      6      196     2152   2181.125  
             85.   13       2      0      107        0          0
            
            
            keep if time==6  //you only need the total AUC for each individual and can drop the other measures
            unless you have other purpose i.e. measuring AUC at different time (say at 3 or 4) for between group individuals.
            Better to save as a different dataset before dropping.
            
            // Plot:
            
            lgraph auc group, xlab(1 "Group-1" 2 "Group-2") ytitle("Mean AUC") ///
            xtitle("Group")
            By the way, could you please have a go to read the FAQ section on how to properly post in this forum using the Code blocks and how best to provide data example. That only assures you get a good response. Your provided dataset example is hardly legible and code examples could have been better.
            Last edited by Roman Mostazir; 30 Dec 2015, 13:27.
            Roman

            Comment


            • #7
              Hi Marcos, Clyde and Roman

              Thank you all for your quick and detailed replies.


              As my protocol is "Intention to treat" and the previous written literature in the field has been based on the Trapezoidal rule my final solution was:

              Code:
              keep id Grp time Result
              // Calculate AUC applying the trapezoidal rule:
              
              sort id time //sort time outside the integ
              
              by id , sort: integ Result time, gen(AUC) t //trapezoidal rule
              
              lgraph AUC time Grp //just to se the plot with all AUC(which though did´t make that much sense)
              
              by id: egen max_AUC=max(AUC) 
              keep if AUC==max_AUC //dropping all but the maximum AUC
              
              ttest AUC, by(Grp)
              
              //Plot
              lgraph AUC Grp, xlab(1 "Grp 1" 2 "Grp 2") ytitle("Mean AUC")  ///
              xtitle("Grp")

              I must say that I admire you all for spending so much time on sharing your expertise

              Happy New Year
              Anders

              Comment

              Working...
              X