Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating duration and failure variables to run kaplan meier estimate of survival function

    Hi everyone,

    I am currently working on generating Kaplan-Meier estimates for a survival function. To start, I need to calculate variables representing the time intervals between consecutive births, which I’ll call duration1_2, duration2_3, ..., up to duration10_11. For example, duration1_2 will show the time in months between the first and second child, duration2_3 will show the interval between the second and third child, and so on up to duration10_11, which represents the time between the tenth and eleventh child.

    Next, I want to create a set of "failure" variables (failure1_2, failure2_3, ..., failure10_11). Each failure variable will be a dummy indicator: it will be set to 1 if the corresponding duration is less than 24 months, and 0 otherwise.

    Additionally, I plan to calculate duration_trunc, which will represent the time (in months) between a woman's last recorded birth and the survey date. If a woman has fewer than 9 children, duration_trunc will be assigned a value of 1000 months to indicate censoring.

    I’ll display the dataset and show the transformation steps I plan to take to prepare it for Kaplan-Meier analysis. Key variables in my dataset include caseid, v007, seq, b19, and v201.

    Let me know if any part needs clarification

    Variable Storage Display Value
    name type format label Variable label
    --------------------------------------------------------------------------------------------
    caseid str15 %15s case identification
    v007 int %8.0g year of interview
    seq str3 %9s
    b19 int %8.0g current age of child in months
    v201 byte %8.0g total children ever born


    the above ae the currently existing variables

    dataex caseid v007 seq b19 v201

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str15 caseid int v007 str3 seq int b19 byte v201
    "       1   1  2" 2018 "_01"  42 5
    "       1   1  2" 2018 "_02" 111 5
    "       1   1  2" 2018 "_03" 134 5
    "       1   1  2" 2018 "_04" 160 5
    "       1   1  2" 2018 "_05" 187 5
    "       1   1  2" 2018 "_06"   . 5
    "       1   1  2" 2018 "_07"   . 5
    "       1   1  2" 2018 "_08"   . 5
    "       1   1  2" 2018 "_09"   . 5
    "       1   1  2" 2018 "_10"   . 5
    "       1   1  2" 2018 "_11"   . 5
    "       1   1  2" 2018 "_12"   . 5
    "       1   1  2" 2018 "_13"   . 5
    "       1   1  2" 2018 "_14"   . 5
    "       1   1  2" 2018 "_15"   . 5
    "       1   1  2" 2018 "_16"   . 5
    "       1   1  2" 2018 "_17"   . 5
    "       1   1  2" 2018 "_18"   . 5
    "       1   1  2" 2018 "_19"   . 5
    "       1   1  2" 2018 "_20"   . 5
    "       1   4  1" 2018 "_01"  66 5
    "       1   4  1" 2018 "_02"  93 5
    "       1   4  1" 2018 "_03" 159 5
    "       1   4  1" 2018 "_04" 184 5
    "       1   4  1" 2018 "_05" 212 5
    "       1   4  1" 2018 "_06"   . 5
    "       1   4  1" 2018 "_07"   . 5
    "       1   4  1" 2018 "_08"   . 5
    "       1   4  1" 2018 "_09"   . 5
    "       1   4  1" 2018 "_10"   . 5
    "       1   4  1" 2018 "_11"   . 5
    "       1   4  1" 2018 "_12"   . 5
    "       1   4  1" 2018 "_13"   . 5
    "       1   4  1" 2018 "_14"   . 5
    "       1   4  1" 2018 "_15"   . 5
    "       1   4  1" 2018 "_16"   . 5
    "       1   4  1" 2018 "_17"   . 5
    "       1   4  1" 2018 "_18"   . 5
    "       1   4  1" 2018 "_19"   . 5
    "       1   4  1" 2018 "_20"   . 5
    "       1   5  2" 2018 "_01"  73 5
    "       1   5  2" 2018 "_02" 104 5
    "       1   5  2" 2018 "_03" 115 5
    "       1   5  2" 2018 "_04" 147 5
    "       1   5  2" 2018 "_05" 196 5
    "       1   5  2" 2018 "_06"   . 5
    "       1   5  2" 2018 "_07"   . 5
    "       1   5  2" 2018 "_08"   . 5
    "       1   5  2" 2018 "_09"   . 5
    "       1   5  2" 2018 "_10"   . 5
    "       1   5  2" 2018 "_11"   . 5
    "       1   5  2" 2018 "_12"   . 5
    "       1   5  2" 2018 "_13"   . 5
    "       1   5  2" 2018 "_14"   . 5
    "       1   5  2" 2018 "_15"   . 5
    "       1   5  2" 2018 "_16"   . 5
    "       1   5  2" 2018 "_17"   . 5
    "       1   5  2" 2018 "_18"   . 5
    "       1   5  2" 2018 "_19"   . 5
    "       1   5  2" 2018 "_20"   . 5
    "       1   6  2" 2018 "_01"  84 4
    "       1   6  2" 2018 "_02" 134 4
    "       1   6  2" 2018 "_03" 172 4
    "       1   6  2" 2018 "_04" 222 4
    "       1   6  2" 2018 "_05"   . 4
    "       1   6  2" 2018 "_06"   . 4
    "       1   6  2" 2018 "_07"   . 4
    "       1   6  2" 2018 "_08"   . 4
    "       1   6  2" 2018 "_09"   . 4
    "       1   6  2" 2018 "_10"   . 4
    "       1   6  2" 2018 "_11"   . 4
    "       1   6  2" 2018 "_12"   . 4
    "       1   6  2" 2018 "_13"   . 4
    "       1   6  2" 2018 "_14"   . 4
    "       1   6  2" 2018 "_15"   . 4
    "       1   6  2" 2018 "_16"   . 4
    "       1   6  2" 2018 "_17"   . 4
    "       1   6  2" 2018 "_18"   . 4
    "       1   6  2" 2018 "_19"   . 4
    "       1   6  2" 2018 "_20"   . 4
    "       1   7  2" 2018 "_01"   . 0
    "       1   7  2" 2018 "_02"   . 0
    "       1   7  2" 2018 "_03"   . 0
    "       1   7  2" 2018 "_04"   . 0
    "       1   7  2" 2018 "_05"   . 0
    "       1   7  2" 2018 "_06"   . 0
    "       1   7  2" 2018 "_07"   . 0
    "       1   7  2" 2018 "_08"   . 0
    "       1   7  2" 2018 "_09"   . 0
    "       1   7  2" 2018 "_10"   . 0
    "       1   7  2" 2018 "_11"   . 0
    "       1   7  2" 2018 "_12"   . 0
    "       1   7  2" 2018 "_13"   . 0
    "       1   7  2" 2018 "_14"   . 0
    "       1   7  2" 2018 "_15"   . 0
    "       1   7  2" 2018 "_16"   . 0
    "       1   7  2" 2018 "_17"   . 0
    "       1   7  2" 2018 "_18"   . 0
    "       1   7  2" 2018 "_19"   . 0
    "       1   7  2" 2018 "_20"   . 0
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 301360 observations
    Use the count() option to list more
    now i will present how i want to transform my dataset into



    dataex id duration1_2 duration2_3 duration3_4 duration4_5 failure1_2 failure2_3 failure3_4
    > failure4_5 failure5_6 duration_n failure_n

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(id duration1_2 duration2_3 duration3_4 duration4_5 failure1_2 failure2_3 failure3_4 failure4_5 failure5_6 duration_n failure_n)
    1020902  27 39  .  . 1 0 . . .  27 1
    1020902  27 39  .  . 1 0 . . .  39 0
    1030403  29  . 34 32 1 1 1 1 1  29 1
    1030403  29  . 34 32 1 1 1 1 1   . 1
    1030403  29  . 34 32 1 1 1 1 1  34 1
    1030602  46 18  .  . 1 0 . . .  46 1
    1030602  46 18  .  . 1 0 . . .  18 0
    1031012  35  3  .  . 1 0 . . .  35 1
    1031012  35  3  .  . 1 0 . . .   3 0
    1031202  16  .  .  . 0 . . . .  16 0
    1040212  25 24 46  . 1 1 0 . .  25 1
    1040212  25 24 46  . 1 1 0 . .  24 1
    1040212  25 24 46  . 1 1 0 . .  46 0
    1040217  23 24 32 28 1 1 1 1 0  23 1
    1040217  23 24 32 28 1 1 1 1 0  24 1
    1040217  23 24 32 28 1 1 1 1 0  32 1
    1040303  58 24 25 23 1 1 1 1 0  58 1
    1040303  58 24 25 23 1 1 1 1 0  24 1
    1040303  58 24 25 23 1 1 1 1 0  25 1
    1040316  30 58  2  . 1 1 0 . .  30 1
    1040316  30 58  2  . 1 1 0 . .  58 1
    1040316  30 58  2  . 1 1 0 . .   2 0
    1040702  32  .  .  . 0 . . . .  32 0
    1040902  23 85 23 14 1 1 1 0 .  23 1
    1040902  23 85 23 14 1 1 1 0 .  85 1
    1040902  23 85 23 14 1 1 1 0 .  23 1
    1041002  29 49  6  . 1 1 0 . .  29 1
    1041002  29 49  6  . 1 1 0 . .  49 1
    1041002  29 49  6  . 1 1 0 . .   6 0
    1050109  15  .  .  . 0 . . . .  15 0
    1050313  26  .  .  . 0 . . . .  26 0
    1050605  24 40 46  3 1 1 1 0 .  24 1
    1050605  24 40 46  3 1 1 1 0 .  40 1
    1050605  24 40 46  3 1 1 1 0 .  46 1
    1050612  27  4  .  . 1 0 . . .  27 1
    1050612  27  4  .  . 1 0 . . .   4 0
    1050702  23  .  .  . 0 . . . .  23 0
    1051001  25 12  .  . 1 0 . . .  25 1
    1051001  25 12  .  . 1 0 . . .  12 0
    1060410   2  .  .  . 0 . . . .   2 0
    1060414  60  .  .  . 0 . . . .  60 0
    1070102 167  .  .  . 0 . . . . 167 0
    1070403  13  .  .  . 0 . . . .  13 0
    1070602   .  .  .  . 1 0 . . .   . 1
    1070602   .  .  .  . 1 0 . . .   . 0
    1070803  29 25 23 67 1 1 1 1 0  29 1
    1070803  29 25 23 67 1 1 1 1 0  25 1
    1070803  29 25 23 67 1 1 1 1 0  23 1
    1071014  14 26 53 27 1 1 1 1 1  14 1
    1071014  14 26 53 27 1 1 1 1 1  26 1
    1071014  14 26 53 27 1 1 1 1 1  53 1
    1071022  15  .  .  . 0 . . . .  15 0
    1071025  53 25 37  . 1 1 0 . .  53 1
    1071025  53 25 37  . 1 1 0 . .  25 1
    1071025  53 25 37  . 1 1 0 . .  37 0
    1071030  13  .  .  . 0 . . . .  13 0
    1071109  36 12 96 36 1 1 1 1 0  36 1
    1071109  36 12 96 36 1 1 1 1 0  12 1
    1071109  36 12 96 36 1 1 1 1 0  96 1
    1080408  86  .  .  . 0 . . . .  86 0
    1090104  19  3  .  . 1 0 . . .  19 1
    1090104  19  3  .  . 1 0 . . .   3 0
    1090207  50  .  .  . 0 . . . .  50 0
    1090209  52  .  .  . 0 . . . .  52 0
    1100509  21  .  .  . 0 . . . .  21 0
    1100816  26 32  .  . 1 0 . . .  26 1
    1100816  26 32  .  . 1 0 . . .  32 0
    1100824  26 32 38 34 1 1 1 0 .  26 1
    1100824  26 32 38 34 1 1 1 0 .  32 1
    1100824  26 32 38 34 1 1 1 0 .  38 1
    1100829  26  . 22  . 1 1 0 . .  26 1
    1100829  26  . 22  . 1 1 0 . .   . 1
    1100829  26  . 22  . 1 1 0 . .  22 0
    1101012 200  2  .  . 1 0 . . . 200 1
    1101012 200  2  .  . 1 0 . . .   2 0
    1101016  56  .  .  . 0 . . . .  56 0
    1101201  65 42  .  . 1 0 . . .  65 1
    1101201  65 42  .  . 1 0 . . .  42 0
    1110204  41  .  .  . 0 . . . .  41 0
    1110502  20 28 43 52 1 1 1 0 .  20 1
    1110502  20 28 43 52 1 1 1 0 .  28 1
    1110502  20 28 43 52 1 1 1 0 .  43 1
    1120102  36  .  .  . 1 1 0 . .  36 1
    1120102  36  .  .  . 1 1 0 . .   . 1
    1120102  36  .  .  . 1 1 0 . .   . 0
    1120705  31 12  .  . 1 0 . . .  31 1
    1120705  31 12  .  . 1 0 . . .  12 0
    1130102  42 51 24 51 1 1 1 1 0  42 1
    1130102  42 51 24 51 1 1 1 1 0  51 1
    1130102  42 51 24 51 1 1 1 1 0  24 1
    1130801  36 62  4  . 1 1 0 . .  36 1
    1130801  36 62  4  . 1 1 0 . .  62 1
    1130801  36 62  4  . 1 1 0 . .   4 0
    1130901  57  .  .  . 0 . . . .  57 0
    1131002  24  4  .  . 1 0 . . .  24 1
    1131002  24  4  .  . 1 0 . . .   4 0
    1140202   .  .  .  . 1 0 . . .   . 1
    1140202   .  .  .  . 1 0 . . .   . 0
    1140502  82  .  .  . 1 1 1 1 0  82 1
    1140502  82  .  .  . 1 1 1 1 0   . 1
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 100 out of 2016 observations
    Use the count() option to list more

    .

  • #2
    You can have most, but not all, of what you ask for.

    Code:
    rename v007 interview_year
    rename b19 child_age
    rename v201 total_children
    
    destring seq, gen(birth_order) ignore("_")
    
    gen duration = child_age[_n+1] - child_age
    gen failure = duration < 24 if !missing(duration)
    
    gen suffix = string(birth_order) + "_" + string(birth_order + 1)
    
    drop if missing(duration)
    keep caseid duration failure suffix
    reshape wide duration failure, i(caseid) j(suffix) string
    You can't have duration_trunc because it is impossible to determine the number of months between anything and the interview--that's because all we know about when the interview happened is the year.

    You can't have the id's you show in your example because they seem to come from nowhere. They are not the caseid's in the starting data and bear no apparent relationship to those caseid's. So without some explanation of where those new id's come from, there can be no coding their creation. The code above leaves the original caseid's instead.

    The renaming of the variables at the start of the code is not strictly necessary. You can replace throughout with the original variable names if you prefer. I just find it difficult and annoying to work with variable names that don't tell me anything about what they are. It's much too easy to mistakenly type the wrong variable name in a code when the variable is some meaningless combination of alphanumeric characters--and then finding the source of the resulting errors when they show up in results is more frustrating than I can tolerate. So I always rename variables to something meaningful. YMMV.

    Comment


    • #3
      Thankyou for your guidance. but now I want to further generate duration_n and failure_n which could show duration 1_2 to dration 8_9 and faillure 1_2 to failure8_9 of each with while keeping the caseid intact similar to the following data set presented below id duration1_2 duration2_3 duration3_4 duration4_5 duration5_6 duration6_7 duration7_8 duration8_9 duration_n. id duration1_2 duration2_3 duration3_4 duration4_5 duration5_6 duration6_7 duration7_8 duration8_9 duration_n failure1_2 failure2_3 failure3_4 failure4_5 failure5_6 failure6_7 failure7_8 failure8_9 failure_n
      1020902 27 39 27 1 0 1
      1020902 27 39 39 1 0 0
      1030403 29 34 32 153 31 22 29 1 1 1 1 1 1 1 1 1
      1030403 29 34 32 153 31 22 1 1 1 1 1 1 1 1 1
      1030403 29 34 32 153 31 22 34 1 1 1 1 1 1 1 1 1
      1030602 46 18 46 1 0 1
      1030602 46 18 18 1 0 0
      1031012 35 3 35 1 0 1
      1031012 35 3 3 1 0 0
      1031202 16 16 0 0
      1040212 25 24 46 25 1 1 0 1
      1040212 25 24 46 24 1 1 0 1
      1040212 25 24 46 46 1 1 0 0
      1040217 23 24 32 28 8 23 1 1 1 1 0 1
      1040217 23 24 32 28 8 24 1 1 1 1 0 1
      1040217 23 24 32 28 8 32 1 1 1 1 0 1
      1040303 58 24 25 23 26 58 1 1 1 1 0 1
      1040303 58 24 25 23 26 24 1 1 1 1 0 1
      1040303 58 24 25 23 26 25 1 1 1 1 0 1
      1040316 30 58 2 30 1 1 0 1
      1040316 30 58 2 58 1 1 0 1
      1040316 30 58 2 2 1 1 0 0
      1040702 32 32 0 0
      1040902 23 85 23 14 23 1 1 1 0 1
      1040902 23 85 23 14 85 1 1 1 0 1
      1040902 23 85 23 14 23 1 1 1 0 1
      1041002 29 49 6 29 1 1 0 1
      1041002 29 49 6 49 1 1 0 1
      1041002 29 49 6 6 1 1 0 0
      1050109 15 15 0 0
      1050313 26 26 0 0
      1050605 24 40 46 3 24 1 1 1 0 1
      1050605 24 40 46 3 40 1 1 1 0 1
      1050605 24 40 46 3 46 1 1 1 0 1
      1050612 27 4 27 1 0 1
      1050612 27 4 4 1 0 0
      1050702 23 23 0 0
      1051001 25 12 25 1 0 1
      1051001 25 12 12 1 0 0
      1060410 2 2 0 0
      1060414 60 60 0 0
      1070102 167 167 0 0
      1070403 13 13 0 0
      1070602 1 0 1
      1070602 1 0 0
      1070803 29 25 23 67 3 29 1 1 1 1 0 1
      1070803 29 25 23 67 3 25 1 1 1 1 0 1
      1070803 29 25 23 67 3 23 1 1 1 1 0 1
      1071014 14 26 53 27 47 2 14 1 1 1 1 1 0 1
      1071014 14 26 53 27 47 2 26 1 1 1 1 1 0 1
      1071014 14 26 53 27 47 2 53 1 1 1 1 1 0 1
      1071022 15 15 0 0
      1071025 53 25 37 53 1 1 0 1
      1071025 53 25 37 25 1 1 0 1
      1071025 53 25 37 37 1 1 0 0
      1071030 13 13 0 0
      1071109 36 12 96 36 0 36 1 1 1 1 0 1
      1071109 36 12 96 36 0 12 1 1 1 1 0 1
      1071109 36 12 96 36 0 96 1 1 1 1 0 1
      1080408 86 86 0 0
      1090104 19 3 19 1 0 1
      1090104 19 3 3 1 0 0
      1090207 50 50 0 0
      1090209 52 52 0 0
      1100509 21 21 0 0
      1100816 26 32 26 1 0 1
      1100816 26 32 32 1 0 0
      1100824 26 32 38 34 26 1 1 1 0 1
      1100824 26 32 38 34 32 1 1 1 0 1
      1100824 26 32 38 34 38 1 1 1 0 1
      1100829 26 22 26 1 1 0 1
      1100829 26 22 1 1 0 1
      1100829 26 22 22 1 1 0 0
      1101012 200 2 200 1 0 1
      1101012 200 2 2 1 0 0
      1101016 56 56 0 0
      1101201 65 42 65 1 0 1
      1101201 65 42 42 1 0 0
      1110204 41 41 0 0
      1110502 20 28 43 52 20 1 1 1 0 1
      1110502 20 28 43 52 28 1 1 1 0 1
      1110502 20 28 43 52 43 1 1 1 0 1
      as i intend to apply
      stset duration_n, failure(failure_n)








      Comment


      • #4
        Sorry, but I do not understand what you are asking for here.

        Comment

        Working...
        X