Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I would have one DX variable and a categorical variable that indicates which DX, in the dataset with DX1-DX40 (in other words, the data will be in long layout). Then on my criteria dataset, each criterion will be associated with a list of variables and I would name this variable "DX" and the associated variable "var" (this in long layout too). Once I execute a many-to-one merge, I will be able to link each var with a DX. The search will be as simple as:

    Code:
    browse DX which if var=="nstemi"
    However, I do not know how you want to use this data, so if the locals work for you, then you can use them. In general, analysis in Stata is more efficient with data in long layout.
    Last edited by Andrew Musau; 06 Sep 2021, 05:00.

    Comment


    • #17
      Thanks very much Prof Musau!
      I have not considered transposing the data in the long layout but it is clearly an efficient solution.
      I now have a new problem which sort of precludes the merge strategy though. The codes in DX1-DX40 may not be exactly the same as listed in the criteria. For example the creteria for the variable nstemi may be "K7031", "Z888", but DX1 can be like Z8888 or K70312. The function 'inlist' seems to only work when DX1-DX40 were exactly the same string as listed in the criteria, but in fact if the first few digits of DX1-DX40 can fit the criteria it would replace the variable=1.

      If there is only one code in the criteria I can write:
      Code:
      gen nstemi=0
      foreach v of varlist DX1-DX40 {
        replace nstemi=1 if strpos(`v', "K7031")
      }
      And it would replace nstemi=1 even if DX1=K70312, but 'strpos' does not work like 'inlist' and cannot search for multiple strings, unless I write
      Code:
      strpos (`v', "K7031") | strpos (`v', "Z888")
      .
      Can I somehow combine 'strpos' and 'inlist' so that the program can search for multiple strings at the same time but only in the first few digits of the variables?

      Thank you very much for your time!
      Ginny

      Comment


      • #18
        You can use the -regexm()- function for this.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str6 var str15 criteria
        "nstemi" `""K7031", "Z888""'
        "var1"   `""O432""'         
        "var2"   `""K590""'         
        "var3"   `""O432","K590""'  
        end
        l
        local total= _N
        gen all= var+" ("+ustrregexra(ustrregexra(criteria, "([^a-zA-Z0-9\,])", ""), "[\,]", "|")+")"
        forval i=1/`=_N'{
            local c`i'= all[`i']
        }
        
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str5(DX1 DX2 DX3) 
        "K7031" "BC112" "K590"
        "VRDX2" "O4321" "K101"
        "V2KL"  "K5902"  "M111"
        "Z8888" "BC1234" "CC120"
        end
        
        forval i=1/`total'{
            gen `=word("`c`i''", 1)' = 0
            foreach v of varlist DX1-DX3{
                replace `=word("`c`i''", 1)' = regexm(`v', "`=ustrregexra("`c`i''", "[`=word("`c`i''", 1)' \s]", "")'") if `=word("`c`i''", 1)'==0
            }
        }
        l
        Res.:

        Code:
        . l
        
             +-----------------------------------------------------+
             |   DX1     DX2     DX3   nstemi   var1   var2   var3 |
             |-----------------------------------------------------|
          1. | K7031   BC112    K590        1      0      1      1 |
          2. | VRDX2   O4321    K101        0      1      0      0 |
          3. |  V2KL   K5902    M111        0      0      1      1 |
          4. | Z8888   BC123   CC120        1      0      0      0 |
             +-----------------------------------------------------+

        Comment


        • #19
          Thanks a lot Prof Musau! That really solved my problems!

          Comment

          Working...
          X