Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Coding for three groups - cases, controls , and other diseases - What a dilemma!

    Hi Statalist,

    I am new her but I have learned a lot from you so far, so thank you all.

    I am using STATA IC 15.1 and the problem I am facing is as follows:

    There are 17 variables e.g. n_1_0 to n_3_2 and the values in these variables like this: number ranged from 1001 -99999 and (.).
    The code I want is:
    1044 for the cases
    Missing (.) values for controls
    Any other numbers excludes 1044 and 99999 and . will represents other diseases group

    The data look like this:
    id n_1_0 n_1_2 n_1_3 n_1_4 n_1_5
    1 - - - - -
    2 1022 1075 - - -
    3 - - 99999 - -
    4 - 1044 - 1044 --
    5 1044 - - - 1006
    6 - - 1044 - -
    7 1010 - - 1044 -


    etc.

    Now I have coded the cases just fine. The code is

    gen status = .
    replace status = 1 if n_1_0==1044 | n_1_2==1044 | n_1_3==1044 | n_1_4==1044 | n_1_5==1044 <<< any time number 1044 recorded that's why I used | (OR)

    and I got 4,123 hits

    similar to controls:
    replace status=2 if n_1_0==. & n_1_2==. & n_1_3==. & n_1_4==. & n_1_5==. <<< It has to be missing in all variables to be control, that's why I used & (AND)

    and I got 457,300

    Here the problem arise every time I try to code for other diseases. And the code I used is:
    replace status = 3 if n_1_0 >=1001 & n_1_0 <99999 & n_1_0 !=1044 and repeat it for other variables.

    What happen after this command is that the number of cases reduced to 3,745 and I think the issue comes from examples id 4 where number 1044 occur twice and id 5 where there is different number such as 1044 and 1006 and the number 1044 comes first and vice versa in id 7.


    I hope anyone help me with this problem and what is the best way to solve it as I am going to deal with much larger data sets like this.

    Thanks!

  • #2
    Welcome to Statalist.

    See if this gets you what you want. I'm not quite sure how you want 99999's coded.


    ----------------------- copy starting from the next line -----------------------
    Code:
    dataex  // To install: ssc install dataex
    clear
    input byte id int n_1_0 long n_1_2 int(n_1_3 n_1_4 n_1_5)
    1    .     .    .    .    .
    2 1022  1075    .    .    .
    3    . 99999    .    .    .
    4 1044     .    . 1044    .
    5 1044     .    .    . 1006
    6    .     . 1044    .    .
    7 1010     .    .    . 1044
    end
    
    . list
    
         +--------------------------------------------+
         | id   n_1_0   n_1_2   n_1_3   n_1_4   n_1_5 |
         |--------------------------------------------|
      1. |  1       .       .       .       .       . |
      2. |  2    1022    1075       .       .       . |
      3. |  3       .   99999       .       .       . |
      4. |  4    1044       .       .    1044       . |
      5. |  5    1044       .       .       .    1006 |
         |--------------------------------------------|
      6. |  6       .       .    1044       .       . |
      7. |  7    1010       .       .       .    1044 |
         +--------------------------------------------+
    
    egen status = anymatch(n*), values(1044)  // This assumes the vars of interest always named n_1_0, n_1_2, etc
    egen is_99999 = anymatch(n*), values(99999)  // Didn't know what you wanted to do with 99999, this sets is_99999==1
    egen notmissing = rownonmiss(n*)  // counts number of nonmissing values in n*
    gen control = ( notmissing==0)
    replace status = 2 if notmissing==0
    replace status = 3 if status==0
    
    . list, noobs abbrev(12) sepby(status)
    
      +---------------------------------------------------------------------------------------+
      | id   n_1_0   n_1_2   n_1_3   n_1_4   n_1_5   status   is_99999   notmissing   control |
      |---------------------------------------------------------------------------------------|
      |  1       .       .       .       .       .        2          0            0         1 |
      |---------------------------------------------------------------------------------------|
      |  2    1022    1075       .       .       .        3          0            2         0 |
      |  3       .   99999       .       .       .        3          1            1         0 |
      |---------------------------------------------------------------------------------------|
      |  4    1044       .       .    1044       .        1          0            2         0 |
      |  5    1044       .       .       .    1006        1          0            2         0 |
      |  6       .       .    1044       .       .        1          0            1         0 |
      |  7    1010       .       .       .    1044        1          0            2         0 |
      +---------------------------------------------------------------------------------------+
    
    . tabulate status
    
    n_1_0 n_1_2 |
    n_1_3 n_1_4 |
       n_1_5 == |
           1044 |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |          4       57.14       57.14
              2 |          1       14.29       71.43
              3 |          2       28.57      100.00
    ------------+-----------------------------------
          Total |          7      100.00
    Last edited by David Benson; 02 Nov 2019, 19:08.

    Comment


    • #3
      Hi David,

      Thank you so much for your response.

      Well, actually, there are numerous other variables that starts with n* and the example that I gave here is one of them.
      For 99999, this I want to discard it. So, neither they will be other diseases nor controls. They will be ignored from the analysis.

      Hope that make sense!

      Comment


      • #4
        David your code seems to work but further to explain the point, status 3 should be something like this: n* !=. & (n*!=1044 & n*!=99999). Hope you get it now!

        Comment


        • #5
          Click image for larger version

Name:	Id.PNG
Views:	1
Size:	5.1 KB
ID:	1523079


          To be more precise, id 3 above contain only 99999 and here we want to discard it completely (as it not their in the data but not as a missing). In contrast, in id 8 there is another number besides 99999, here we want to count that number in the analysis and ignore the 99999. So in essence, id 3 will not be calculated at all in the analysis and id 8 will be considered as other diseases.

          Any help!

          Comment


          • #6
            Mohammad,

            1) So, screenshots aren't as helpful as you think. In the future, please share your data using Stata's dataex command. (Most Stata users aren't familiar with dataex unless they've posted here before. I created a short Youtube video on dataex here. (I made it too long--feel free to watch at 2x speed, and you may only need the first 6 minutes) .

            See if this works:

            Code:
            // I added one more obs to what you posted in post #5.  Something with a 1044 and 99999
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input byte id int n_1_0 long(n_1_2 n_1_3) int(n_1_4 n_1_5)
             1    .     .     .    .    .
             2 1022  1075     .    .    .
             3    . 99999     .    .    .
             4 1044     .     . 1044    .
             5 1044     .     .    . 1006
             6    .     .  1044    .    .
             7 1010     .     .    . 1044
             8 1023 99999     .    .    .
             9 1044     . 99999    .    .
            10    .     .     .    .    .
            end
            
            . list, noobs
            
              +--------------------------------------------+
              | id   n_1_0   n_1_2   n_1_3   n_1_4   n_1_5 |
              |--------------------------------------------|
              |  1       .       .       .       .       . |
              |  2    1022    1075       .       .       . |
              |  3       .   99999       .       .       . |
              |  4    1044       .       .    1044       . |
              |  5    1044       .       .       .    1006 |
              |--------------------------------------------|
              |  6       .       .    1044       .       . |
              |  7    1010       .       .       .    1044 |
              |  8    1023   99999       .       .       . |
              |  9    1044       .   99999       .       . |
              | 10       .       .       .       .       . |
              +--------------------------------------------+
            
            rename n_1_* var*  // renaming the disease variables 
            
            
              +-----------------------------------------+
              | id   var0    var2    var3   var4   var5 |
              |-----------------------------------------|
              |  1      .       .       .      .      . |
              |  2   1022    1075       .      .      . |
              |  3      .   99999       .      .      . |
              |  4   1044       .       .   1044      . |
              |  5   1044       .       .      .   1006 |
              |-----------------------------------------|
              |  6      .       .    1044      .      . |
              |  7   1010       .       .      .   1044 |
              |  8   1023   99999       .      .      . |
              |  9   1044       .   99999      .      . |
              | 10      .       .       .      .      . |
              +-----------------------------------------+
            
            // Could use egen, anycount() val(99999) to count how many times 99999 is in var0-var5
            egen has_1044 = anymatch(var0-var5), values(1044)  // 1 if any of those variables = 1044
            egen has_99999 = anymatch(var0-var5), values(99999)  // 1 if any of those variables == 99999
            egen notmissing = rownonmiss(var0-var5)  // how many of the variables are not missing
            gen case = (has_1044==1)
            gen control = ( notmissing==0)  // var0-var5 are ALL blank
            gen to_drop = (has_9==1 & notmissing==1)  // var0-var5 contains ONLY 99999 and all others are blank
            
            gen status = "1-case" if case==1
            replace status = "2-control" if control==1
            replace status = "4-remove" if to_drop==1
            rename has_99999 has_99  // shortening var name to make easier to list
            replace status = "3-Other disease" if status=="" 
            compress
            sort status id
            order status, after(id)  // places status after id variable
            gen sample = (status=="1-case" | status=="2-control")
            
            . list, noobs sepby(status)
            
              +------------------------------------------------------------------------------------------------------------------------------+
              | id            status   var0    var2    var3   var4   var5   has_1044   has_99   notmis~g   case   control   to_drop   sample |
              |------------------------------------------------------------------------------------------------------------------------------|
              |  4            1-case   1044       .       .   1044      .          1        0          2      1         0         0        1 |
              |  5            1-case   1044       .       .      .   1006          1        0          2      1         0         0        1 |
              |  6            1-case      .       .    1044      .      .          1        0          1      1         0         0        1 |
              |  7            1-case   1010       .       .      .   1044          1        0          2      1         0         0        1 |
              |  9            1-case   1044       .   99999      .      .          1        1          2      1         0         0        1 |
              |------------------------------------------------------------------------------------------------------------------------------|
              |  1         2-control      .       .       .      .      .          0        0          0      0         1         0        1 |
              | 10         2-control      .       .       .      .      .          0        0          0      0         1         0        1 |
              |------------------------------------------------------------------------------------------------------------------------------|
              |  2   3-Other disease   1022    1075       .      .      .          0        0          2      0         0         0        0 |
              |  8   3-Other disease   1023   99999       .      .      .          0        1          2      0         0         0        0 |
              |------------------------------------------------------------------------------------------------------------------------------|
              |  3          4-remove      .   99999       .      .      .          0        1          1      0         0         1        0 |
              +------------------------------------------------------------------------------------------------------------------------------+
            Hopefully this gets you what you want, or the other variables (has_1044, has_99, notmissing, etc) that you can create the categories that you want.

            Comment


            • #7
              I have tried dataex but will watch the link you sent. Your code now is perfect. Thank you so much David! I really appreciate your help.

              Best wishes.

              Comment


              • #8
                Hi David Benson hope you are well. So I used dataex as you suggest after watching your video.

                I wonder if there is a compact command to code the following condition:

                generate incidence= 1 if n_20007* < n_210003_0_0 and status==1

                I have tried to code it separately and then gen an new var and combine it but it does not seems right. I feel newbie now though I have been using STATA now for more than a year. It becomes hard when dealing with a very large data. so I hope you can help.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input double(n_20007_0_0 n_20007_0_1 n_20007_0_2 n_20007_0_3 n_20007_0_4 n_20007_0_5 n_20007_1_0 n_20007_1_1 n_20007_1_2 n_20007_1_3 n_20007_2_0) int n_21003_0_0 float status
                      . . . . . . . . . .       . 57 2
                      . . . . . . . . . .       . 58 2
                      . . . . . . . . . .       . 63 2
                      . . . . . . . . . .       . 61 2
                      . . . . . . . . . .       . 47 2
                      . . . . . . . . . .       . 61 2
                56.4939 . . . . . . . . .       . 58 3
                      . . . . . . . . . .       . 63 2
                      . . . . . . . . . .       . 50 2
                      . . . . . . . . . .       . 63 2
                      . . . . . . . . . .       . 66 2
                      . . . . . . . . . .       . 67 2
                      . . . . . . . . . .       . 59 2
                      . . . . . . . . . .       . 59 2
                      . . . . . . . . . .       . 66 2
                60.4885 . . . . . . . . .       . 65 3
                      . . . . . . . . . .       . 57 2
                      . . . . . . . . . .       . 66 2
                      . . . . . . . . . .       . 63 2
                      . . . . . . . . . .       . 46 2
                      . . . . . . . . . .       . 67 2
                57.6684 . . . . . . . . .       . 58 3
                      . . . . . . . . . . 70.1204 66 4
                      . . . . . . . . . .       . 67 2
                      . . . . . . . . . .       . 53 2
                56.1188 . . . . . . . . .       . 62 3
                60.2338 . . . . . . . . .    61.5 62 1
                      . . . . . . . . . .       . 48 2
                      . . . . . . . . . .       . 66 2
                      . . . . . . . . . .       . 64 2
                      . . . . . . . . . .       . 57 2
                      . . . . . . . . . .       . 43 2
                      . . . . . . . . . .       . 45 2
                      . . . . . . . . . .       . 52 2
                      . . . . . . . . . .       . 55 2
                      . . . . . . . . . .       . 59 2
                      . . . . . . . . . .       . 50 2
                      . . . . . . . . . .       . 60 2
                      . . . . . . . . . .       . 56 2
                      . . . . . . . . . .       . 47 2
                      . . . . . . . . . .       . 50 2
                      . . . . . . . . . .       . 58 2
                      . . . . . . . . . .       . 50 2
                      . . . . . . . . . .       . 60 2
                      . . . . . . . . . .       . 64 2
                      . . . . . . . . . .       . 63 2
                      . . . . . . . . . .       . 68 2
                      . . . . . . . . . .       . 61 2
                37.0273 . . . . . . . . .       . 49 3
                      . . . . . . . . . .       . 52 2
                      . . . . . . . . . .       . 61 2
                      . . . . . . . . . .       . 68 2
                46.6401 . . . . . . . . .       . 47 3
                      . . . . . . . . . .       . 69 2
                      . . . . . . . . . .       . 58 2
                      . . . . . . . . . .       . 58 2
                50.2651 . . . . . . . . .       . 57 4
                      . . . . . . . . . .       . 68 2
                      . . . . . . . . . .       . 48 2
                      . . . . . . . . . .       . 43 2
                      . . . . . . . . . .       . 55 2
                      . . . . . . . . . .       . 57 2
                      . . . . . . . . . .       . 67 2
                      . . . . . . . . . .       . 55 2
                      . . . . . . . . . .       . 50 2
                      . . . . . . . . . .       . 60 2
                      . . . . . . . . . .       . 53 2
                      . . . . . . . . . .       . 52 2
                      . . . . . . . . . .       . 41 2
                      . . . . . . . . . .       . 62 2
                51.9955 . . . . . . . . . 52.9948 54 3
                      . . . . . . . . . .       . 68 2
                      . . . . . . . . . .       . 63 2
                      . . . . . . . . . .       . 48 2
                      . . . . . . . . . .       . 67 2
                      . . . . . . . . . .       . 61 2
                      . . . . . . . . . .       . 62 2
                      . . . . . . . . . .       . 41 2
                      . . . . . . . . . .       . 63 2
                      . . . . . . . . . .       . 62 2
                      . . . . . . . . . .       . 63 2
                      . . . . . . . . . .       . 66 2
                      . . . . . . . . . .       . 40 2
                      . . . . . . . . . .       . 53 2
                      . . . . . . . . . .       . 52 2
                      . . . . . . . . . .       . 43 2
                      . . . . . . . . . .       . 52 2
                      . . . . . . . . . .       . 67 2
                      . . . . . . . . . .       . 64 2
                      . . . . . . . . . .       . 49 2
                      . . . . . . . . . .       . 66 2
                50.6621 . . . . . . . . .       . 53 3
                      . . . . . . . . . .       . 59 2
                      . . . . . . . . . .       . 59 2
                      . . . . . . . . . .       . 54 2
                      . . . . . . . . . .       . 56 2
                      . . . . . . . . . .       . 49 2
                54.0681 . . . . . . . . .       . 62 3
                      . . . . . . . . . .       . 66 2
                      . . . . . . . . . .       . 44 2
                end
                label values status status
                label def status 1 "case", modify
                label def status 2 "control", modify
                label def status 3 "other", modify
                label def status 4 "to_remove", modify

                Thanks in advance!

                Comment


                • #9
                  Never mind David. I guess I figured it out.

                  I used the following commands:
                  egen mx = rowmax(n_20007*)
                  gen incidence = 1 if mx < n_21003_0_0 & status ==1

                  I hope this is correct anyway

                  Comment


                  • #10
                    Hi again

                    As I told you I will face much complex data, here we go. I tried to use egen command anymatch for value let say (C61) I got invalid numlist error because I guess it has letter although I encoded all the variables from string.
                    for instance:
                    egen hasK30= anymatch (s_41270*), values(K30)

                    Here is an example of ICD10 data
                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input str6(s_41270_0_0 s_41270_0_1 s_41270_0_2 s_41270_0_3 s_41270_0_4 s_41270_0_5 s_41270_0_6 s_41270_0_7 s_41270_0_8 s_41270_0_9 s_41270_0_10)
                    "I10"  "M4316" "M513"  "M5457" "R074"  ""      ""      ""      ""     ""     ""    
                    "D649" "J40"   "K30"   "M1997" "N840"  "R104"  ""      ""      ""     ""     ""    
                    "J920" "K219"  "K30"   "K449"  "K921"  "L980"  "R194"  ""      ""     ""     ""    
                    "E780" "M819"  "S3200" "W193"  ""      ""      ""      ""      ""     ""     ""    
                    "K219" "K30"   "K529"  "N309"  "N328"  "O342"  "O641"  "O680"  "O701" "O721" "R31" 
                    ""     ""      ""      ""      ""      ""      ""      ""      ""     ""     ""    
                    "C509" "K801"  "N601"  "Z853"  ""      ""      ""      ""      ""     ""     ""    
                    "H269" ""      ""      ""      ""      ""      ""      ""      ""     ""     ""    
                    "E149" "H605"  "I341"  "I351"  "K529"  "L989"  "M2596" "N840"  "N841" "R103" "T933"
                    ""     ""      ""      ""      ""      ""      ""      ""      ""     ""     ""    
                    "D171" "I10"   "I849"  "K148"  "K579"  "L82"   "M169"  "N183"  "R413" "Z886" "Z888"
                    "E780" "I10"   "I251"  "I350"  "I352"  "R074"  "Z824"  ""      ""     ""     ""    
                    "K802" "Z538"  ""      ""      ""      ""      ""      ""      ""     ""     ""    
                    "H269" "K20"   "K297"  "K30"   "R630"  ""      ""      ""      ""     ""     ""    
                    "E780" "H269"  "I10"   "I209"  "I210"  "I211"  "I252"  "I258"  "J440" "J969" "R074"
                    "I10"  "J459"  "K625"  "R15"   "R194"  ""      ""      ""      ""     ""     ""    
                    "B968" "I10"   "I848"  "K295"  "K297"  "K449"  "M2323" "M2596" "R101" "Z871" ""    
                    "C509" "E780"  "I10"   "J459"  "K219"  "S5250" "W192"  "W195"  "Z880" ""     ""    
                    "I10"  "I802"  "I849"  "K562"  ""      ""      ""      ""      ""     ""     ""    
                    "G560" "L031"  "M109"  "M179"  "M2322" "M2332" "M2557" "M431"  "N134" "N202" "N328"
                    end

                    Is there anyway to use egen commands like anymatch and rownonmiss in this case? @David Benson What do you think?



                    Any help is appreciated!

                    Comment


                    • #11
                      Well, never mind again! I figured it out and will post it for the benefit of others.
                      Code:
                      generate hasK30 = 0
                      foreach var of varlist s_41270_0_0 - s_41270_0_10 {
                      replace  hasC61 = 1 if strpos(`var',"K30")
                      }
                      tab hasK30
                      Finding all blank
                      Code:
                      generate xmiss = 0
                      foreach var of varlist s_41270_0_0 - s_41270_0_10 {
                      replace  xmiss = 1 if strpos(`var',"")
                      }
                      tab xmiss

                      Comment

                      Working...
                      X