Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • setting my data up to calculate transitional probabilities, problem with part of the code (gen var=f.status)

    I was first using a test data set, now using a dummy dataset that clearly represents my research data

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float id str7 event float(treatment dead revised year op) long status float nextyr
     1 "op"      1 1 1 2001 1 2 .
     1 "revised" 1 1 1 2004 1 3 1
     1 "death"   1 1 1 2005 1 1 .
     2 "op"      0 0 1 2001 1 2 .
     2 "revised" 0 0 1 2007 1 3 .
    19 "op"      0 1 0 2008 1 2 .
    19 "death"   0 1 0 2016 1 1 .
    45 "op"      0 0 1 2005 1 2 .
    45 "revised" 0 0 1 2008 1 3 .
    46 "op"      1 1 0 2007 1 2 .
    46 "death"   1 1 0 2020 1 1 .
    54 "op"      1 0 0 2001 1 2 .
    76 "op"      1 1 0 2009 1 2 .
    76 "death"   1 1 0 2015 1 1 .
    89 "op"      1 1 0 2006 1 2 .
    89 "death"   1 1 0 2010 1 1 .
    end
    format %ty year
    label values treatment q1
    label def q1 0 "control", modify
    label def q1 1 "treatment", modify
    label values dead q2
    label def q2 0 "alive", modify
    label def q2 1 "dead", modify
    label values revised q3
    label def q3 0 "success", modify
    label def q3 1 "revised", modify
    label values status status
    label def status 1 "death", modify
    label def status 2 "op", modify
    label def status 3 "revised", modify

    Code:
    //// start transition probabilities
    
    // create a datset of probabilities using the example data
    //declares data panel data
    
    xtset id year, yearly
    
    //takes the value of status in the following row -- this, as you can see from the data provided in dataex, only works for observation 2 , id = 1.
    
    generate nextyr=f.status
    //My question, why doesn't for eg id = 2, observation 4 take the value nextyr = 3,
    the same can be said for observation 7, id = 19 , I would have expected observation 6 for nextyr for id=19 to be 1?
    Why is this not happening?




    //// Just fyi... this is my plan for the rest of the data....

    drop if missing(nextyr)
    generate f = 1

    ///it calculates the count of transitions from (status) to nextyr (new transition) within each year.
    collapse (sum) f, by(year status nextyr)

    ///This calculates the total count of transitions from each status within each year.
    bysort year status: egen all = total(f)

    //It divides the count of transitions (f) by the total count of transitions from the same starting state (all)
    //this gives the proportion of transitions to each possible next state, conditional on the current state and year.
    generate p = f/all

    // review intermediate output
    //formatting to 3 decimal places (9 characters)
    format %9.3f p
    Last edited by Rose Matthews; 18 Mar 2024, 09:04.

  • #2
    Ah, I figured out why this is not working.

    Reason being is because I have gaps between each year.
    The only one which doesn't have a gap is the one with the red arrow and that is why the following code behaves

    Code:
    generate nextyr=f.status

    Any thoughts how perhaps I can address that for 'nextyr' this takes on the status of the consecutive row, even if there are gaps in 'year'

    Aim is to calculate the transitional probabilities, if died/ revised hasn't taken place, the patient is thought to remain in the same state as 'op' = alive

    Click image for larger version

Name:	Screenshot 2024-03-18 at 14.30.21.png
Views:	1
Size:	80.3 KB
ID:	1747064

    Comment


    • #3
      So you have to create a new variable that is sequential when the data are in chronological order within id.
      Code:
      by id (year), sort: gen int seq = _n
      xtset id seq
      gen nextyr:status = F.status

      Comment


      • #4
        thanks for this insight, however, I do have an additional question

        How would I be able to differentiate between those observations that are missing in the new variable -nextyr-

        For eg ID1 = year 2005, next year = . (pt is dead - can be dropped) , already have this recorded as dead in year 2004.

        VS

        ID=54 year 2001 = next year =. (pt is alive = success story)


        (1) how will I be able to keep those that are missing but are alive (success) story vs those that are missing but are dead - exited the study

        (2) if i may ask another question, which I have touched upon in another post and perhaps addressed the question in post 2 in a different manner... as seen here
        ​​​​​​​
        https://www.statalist.org/forums/for...-gaps-in-years

        Does it matter if there are gaps in the years when calculating transitional probabilities for a markov model?

        Comment


        • #5
          Yes, it matters a great deal if there are gaps in the years when calculating transitional probabilities for a Markov model. Evidently if it takes three years to go from state A to state B, the transition probability is different than if you go from state A to state B in 1 year. Now, I don't know what program you are using to do these calculations, or if you are crafting your own code to do it. But gaps in the years should either be prohibited (i.e. you must fill in the gaps before running the program), or the code should account for the gaps in the calculation. I have never had occasion to use Stata for this particular purpose, so I can't advise you more specifically than that about this. Assuming you are using a pre-existing program, you should consult the help file to understand how it deals with this situation.

          I don't understand your first question. The patient who died has an observation in which they show status = dead. The other one does not. That is how they differ. I don't know why you would drop the observation showing the death from your study, but perhaps that has something to do with the particulars of the program you are using to calculate transition probabilities.

          Comment


          • #6
            thanks for this, as you can see from my post 1
            that was my code for calculating transition probabilities in stata...why else would I use other software?

            However, I suppose I have a problem which I need to account for which are the year gaps. I suppose from your post in 5; you don't have any further advice on what else I can do to address the gaps, then?


            Code:
            //// start transition probabilities
            // create a datset of probabilities using the example data
            //declares data panel data
            
            xtset id year, yearly
            
            //takes the value of status in the following row -- this, as you can see from the data provided in dataex, only works for observation 2 , id = 1.
            
            generate nextyr=f.status
            
            
            //// Just fyi... this is my plan for the rest of the data....
            
            drop if missing(nextyr)
            generate f = 1
            
            ///it calculates the count of transitions from (status) to nextyr (new transition) within each year.
            collapse (sum) f, by(year status nextyr)
            
            ///This calculates the total count of transitions from each status within each year.
            bysort year status: egen all = total(f)
            
            //It divides the count of transitions (f) by the total count of transitions from the same starting state (all)
            //this gives the proportion of transitions to each possible next state, conditional on the current state and year.
            generate p = f/all
            
            // review intermediate output
            //formatting to 3 decimal places (9 characters)
            format %9.3f p

            Comment


            • #7
              I hadn't looked at your code in #1, I was picking up from #2.

              I didn't think you were using other software for the transition probabilities. I though you might have used some Stata program, perhaps something user-written, for the purpose.

              For your over-arching problem of computing transition probabilities, I think the most straightforward way to get this right is to fill in the missing years. Now, you have to make some assumption about what happened during the missing years. I think for the situations you are working with here, you can fairly assume that the status in any given year that was not directly observed is the same as the status in the most recent preceding observed year. In effect, you are assuming that you have observed all ops, revisions, and deaths: there were no such events that occurred that were not in the original data. Then from there you can calculate the transition probabilities from each state to each other state in each year by:
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input float id str7 event float(treatment dead revised year op) long status
               1 "op"      1 1 1 2001 1 2
               1 "revised" 1 1 1 2004 1 3
               1 "death"   1 1 1 2005 1 1
               2 "op"      0 0 1 2001 1 2
               2 "revised" 0 0 1 2007 1 3
              19 "op"      0 1 0 2008 1 2
              19 "death"   0 1 0 2016 1 1
              45 "op"      0 0 1 2005 1 2
              45 "revised" 0 0 1 2008 1 3
              46 "op"      1 1 0 2007 1 2
              46 "death"   1 1 0 2020 1 1
              54 "op"      1 0 0 2001 1 2
              76 "op"      1 1 0 2009 1 2
              76 "death"   1 1 0 2015 1 1
              89 "op"      1 1 0 2006 1 2
              89 "death"   1 1 0 2010 1 1
              end
              format %ty year
              label values treatment q1
              label def q1 0 "control", modify
              label def q1 1 "treatment", modify
              label values dead q2
              label def q2 0 "alive", modify
              label def q2 1 "dead", modify
              label values revised q3
              label def q3 0 "success", modify
              label def q3 1 "revised", modify
              label values status status
              label def status 1 "death", modify
              label def status 2 "op", modify
              label def status 3 "revised", modify
              
              
              //    FILLIN MISSING YEARS, CARRYING STATUS FORWARD
              xtset id year
              tsfill
              by id (year), sort: replace status = L1.status if missing(status)
              
              gen next_status:status = F1.status
              drop if missing(next_status)
              
              collapse (count) n_transitions = id, by(year status next_status)
              by year status: egen all_transitions_out = total(n_transitions)
              gen transition_probability = n_transitions/all_transitions_out
              Note: Calculating separate transition probabilities for each year is relying on very scanty data. In most years, most of the Unless you have good reasons to believe that the probabilities really do change from one calendar year to the next, I recommend just calculating a single set of transition probabilities for all years. For that, you don't even have to write all of this code. After you have done the -tsfill- and -...replace status...- commands you can just run -xttrans status- and get the results directly (as percentages, not probabilities). If you really do need different transition probabilities for each year, then I strongly recommend getting a richer data set before proceeding.


              Comment


              • #8
                On further thought, rather than dropping the final observation for each id due to lack of information about the next status, if we take seriously the assumption that the original data is not missing any revisions or deaths, then it is fair to assume that for the final observation the next status will simply be the same is the current status: if there were a revision or death, there would have been another observation to show that. (I realize that it is a bit tenuous to assume that no transitions have gone unobserved, but I think without that assumption given all the gaps in the data you would be in no realistic position to calculate transition probabilities at all, and although the assumption is probably not strictly true, given the particular events in question here, it is not altogether unreasonable.)

                So, if you want to go this route, the code changes slightly to:
                [code]
                // FILLIN MISSING YEARS, CARRYING STATUS FORWARD
                xtset id year
                tsfill
                by id (year), sort: replace status = L1.status if missing(status)

                // xttrans status

                isid id year, sort
                gen next_status:status = F1.status
                replace next_status = cond(!dead, status, `="death":status') ///
                if missing(next_status)


                collapse (count) n_transitions = id, by(status next_status)
                by status: egen all_transitions_out = total(n_transitions)
                gen transition_probability = n_transitions/all_transitions_out
                [/coe]

                Comment


                • #9
                  Dear Clyde,

                  as always thanks for your insight,
                  i’ve been chewing over yours answers over the past 48hrs .

                  i do want to clarify what you mention here (post7)

                  I recommend just calculating a single set of transition probabilities for all years.


                  do you mean calculating one transition probability for treatment 1 and control 0 for 2003-2021 for status = alive, status = dead status = revised (complete dataset)

                  therefore ending up with simply 3 values (dead-alive-revised) , for treatment = 1 and another 3 values for treatment = 0

                  (which Is really what I would like to continue working in my markov model)

                  However im not sure if i understood you correctly


                  Comment


                  • #10
                    Yes, that is what I meant. Sorry for not being clearer about that.

                    Comment


                    • #11
                      thank you for clarifying, indeed. using the following code below, rather than all the code I generated above is so much simpler considering I just need to create one transition probability for treatment 1 and control 0 for 2003-2021 for status = alive, status = dead status = revised (complete dataset).therefore ending up with simply 3 values (dead-alive-revised) , for treatment = 1 and another 3 values for treatment = 0

                      I have used the code below with the dataset provided in post 1

                      Code:
                      xtset id year
                      tsfill
                      by id (year), sort: replace status = L1.status if missing(status)
                      xttrans status
                      I have obtained the following output:

                      Click image for larger version

Name:	Screenshot 2024-03-21 at 05.48.30.png
Views:	1
Size:	37.5 KB
ID:	1747401

                      Using the presentation by Peter Austin
                      https://www.stata.com/meeting/boston...14_nichols.pdf

                      I just wanted to clarify my interpretation is correct
                      Transition probability from State 1 to 2: 11.54% therefore 0.12
                      Transition probability from State 1 to 3 : 100% therefore 1.00
                      Transition probability from State 2 to 2 : 84.62%
                      Transition probability from State 2 to 3: 0.00%
                      Transition probability from State 3 to 2: 3.85%
                      Transition probability from state 3 to 3: 0.00%

                      Is this correct?

                      Q1. However what is the probability of remaining in State 1....?

                      My mistake here is that I kept both treatment and control in the same dataset, when I should have run the code in this post but instead with
                      Code:
                      keep if treatment ==1 
                      
                      ///This would give me the transition probabilities (as percentages) for treatment ==1, then I repeat the above for treatment == 0 (giving me the probabilities for the control)
                      q2. do you agree with this?

                      Appreciate your insight, many thanks

                      Comment


                      • #12
                        You have used the code correctly. But your interpretation of the -xttrans- output has things reversed. The cells give the probability of transition from the state in the rowstub to the state in the column header. So, the probability of transition from state 2 to state 1 is 11.54%, from state 2 to state 2 (i.e. stay in state 2) is 84.62%, and from state 2 to state 3 is 3.85%. Similarly, the probability of transition from state 3 to state 1 is 100% and to either state 2 or state 3 is 0%. As for transitions out of state 1, none were observed in the data, so nothing is reported in the -xttrans- output. This is not surprising since earlier in the thread we can see that state 1 is "dead." Dead is an absorbing state: once you are dead, you stay dead.

                        That said, I think there is a problem with your data. It is strange to observe that from state 3 ("revised") there is 100% probability of transition to state 1 ("dead".) The implication is that nobody ever survives a revision! Looking at your example data, I see that whenever an id comes to state 3, that is the final observation of their data--you have no follow-up on anybody beyond the revision. Given the results you got for the transition probabilities, it seems likely that the same is true in the entire data set. You need, therefore, to augment the data. The best way to do that would be to go back and get real data on what happened to these people in the year(s) after their revisions. If that is not feasible, then for every person whose data ends in a state other than death, you need to add another observation with a new state: lost to follow-up.
                        Code:
                        //    FILLIN MISSING YEARS, CARRYING STATUS FORWARD
                        xtset id year
                        tsfill
                        by id (year), sort: replace status = L1.status if missing(status)
                        
                        //    ADD LOST TO FOLLOW-UP AS FINAL STATE FOR THOSE NOT DEAD AT END OF DATA
                        label define status    0    "ltfu", add
                        by id (year): gen expander = cond(_n == _N & status != 1, 2, 1)
                        expand expander
                        by id (year), sort: replace year = year +1 if _n == _N & expander == 2
                        by id (year), sort: replace status = 0 if _n == _N & status != 1
                        
                        xttrans status
                        As for your second question, yes, you should have done this separately for treatment == 1 and treatment == 0. To make this work, however, the above code needs some additional modification to spread the value of treatment to all of the filled in observations. So, it becomes:
                        Code:
                        //    FILLIN MISSING YEARS, CARRYING STATUS FORWARD
                        xtset id year
                        tsfill
                        by id (treatment), sort: replace treatment = treatment[1]
                        by id (year), sort: replace status = L1.status if missing(status)
                        
                        //    ADD LOST TO FOLLOW-UP AS FINAL STATE FOR THOSE NOT DEAD AT END OF DATA
                        label define status    0    "ltfu", add
                        by id (year): gen expander = cond(_n == _N & status != 1, 2, 1)
                        expand expander
                        by id (year), sort: replace year = year +1 if _n == _N & expander == 2
                        by id (year), sort: replace status = 0 if _n == _N & status != 1
                        
                        
                        xttrans status if treatment
                        xttrans status if !treatment

                        Comment

                        Working...
                        X