Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • putting a long non-variable list as local macro

    I apologize in advance if this has been addressed in earlier forums.. I was unable to find it but may have been searching the wrong keywords.

    My data looks like this:

    patid .... var_code1 var_code2 .. var_code50
    1________eye01____ear05______gas03
    2________gas05____gas03______eye02
    3________ear01_____ear02______ .
    ..... with about 30 different prefixes and suffixes of 04-20.


    I am using Stata 13.1 on Windows and I would like to make the data look like this:
    patid .... abc_eye01 abc_eye02 .. abc_eye20 abc_ear01-abc_ear15 abc_gas01-abc_gas20 ...
    1 _________ 1 ______ 0 ______ 0 ....
    2 _________ 0 ______ 1 ______ 0 ....
    ...


    What I think I need to do is:

    local abc "??"
    foreach j of local abc {
    gen abc_`j'=0
    foreach i of varlist var_code1-var_code50{
    replace abc_`j'=1 if `i'=="`j'"
    }
    }

    But I don't know how to put in the list of new variables-to-be into the macro..

    I tried local abc "eye01-eye20 ear01-ear15 gas01-gas20"
    but this just created three entries of eye01-eye20 ear01-ear15 gas01-gas20, instead of eye01, eye02, eye03... etc.

    I found some useful information about using unab or ds but since I don't have these variables in the set yet, I couldn't understand how to use those commands.

    Could someone help me to figure out how to write the macro command?

    Or am I approaching this incorrectly and should be doing another way?

    Thank you for taking the time to read all this!

  • #2
    I don't think you need any of that.

    Code:
    . clear
    
    . input patid str5 code1 str5 code2 str5  code50
    
             patid      code1      code2     code50
      1. 1        eye01    ear05      gas03
      2. 2        gas05    gas03      eye02
      3. 3        ear01     ear02      ""
      4. end
    
    . reshape long code , i(patid)
    (note: j = 1 2 50)
    
    Data                               wide   ->   long
    -----------------------------------------------------------------------------
    Number of obs.                        3   ->       9
    Number of variables                   4   ->       3
    j variable (3 values)                     ->   _j
    xij variables:
                         code1 code2 code50   ->   code
    -----------------------------------------------------------------------------
    
    . drop if missing(code)
    (1 observation deleted)
    
    . rename _j i_
    
    . replace i_ = 1
    (5 real changes made)
    
    . reshape wide i_, i(patid) j(code) string
    (note: j = ear01 ear02 ear05 eye01 eye02 gas03 gas05)
    
    Data                               long   ->   wide
    -----------------------------------------------------------------------------
    Number of obs.                        8   ->       3
    Number of variables                   3   ->       8
    j variable (7 values)              code   ->   (dropped)
    xij variables:
                                         i_   ->   i_ear01 i_ear02 ... i_gas05
    -----------------------------------------------------------------------------
    
    . renpfix i_
    
    . mvencode *, mv(0)
           ear01: 2 missing values recoded
           ear02: 2 missing values recoded
           ear05: 2 missing values recoded
           eye01: 2 missing values recoded
           eye02: 2 missing values recoded
           gas03: 1 missing value recoded
           gas05: 2 missing values recoded
    
    . list
    
         +---------------------------------------------------------------+
         | patid   ear01   ear02   ear05   eye01   eye02   gas03   gas05 |
         |---------------------------------------------------------------|
      1. |     1       0       0       1       1       0       1       0 |
      2. |     2       0       0       0       0       1       1       1 |
      3. |     3       1       1       0       0       0       0       0 |
         +---------------------------------------------------------------+

    Comment


    • #3
      It looks to me like you can get what you want with

      Code:
      reshape long var_code, i(patid) j(_j)
      levelsof var_code, local(codes)
      foreach c of local codes {
          gen byte abc_`c' = (var_code == `"`c'"')
      }
      collapse (max)  abc*, by(patid)
      order abc_*, sequential after(patid)
      The above code assumes either that if a patid record mentions a given var_code more than once , you still want the corresponding abc_ variable to be 1. If, instead, you wanted a count of mentions, replace (max) by (sum) in the -collapse- statement.

      Hope this helps.

      Comment


      • #4
        Just a note on Nick's use of renpfix:

        help renpfix:

        As of Stata 12.0, renpfix has been superseded by the new syntaxes allowed with rename, which are documented in
        > [D] rename group.
        You should:

        1. Read the FAQ carefully.

        2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

        3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

        4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

        Comment


        • #5
          Thank you so much for the help... I really appreciate it!

          A quick question, is there a way to do this efficiently without reshape? I only ask because I have ~1.4 million observations, which is quite slow to reshape.

          But I am very happy to have a working solution, thank you again.

          Comment


          • #6
            Yes, reshape is slow. If you know ahead of time what the 30 different prefixes are and don't mind having 600 variables created, one for each combination of the 30 prefixes and 20 suffixes, then you can just do some more loops:

            Code:
            foreach prefix in eye ear gas ... {
            forvalues x=1/20 {
               local j = "`prefix'"+string(`x',"%02.0f")
               gen abc_`j'=0
               foreach var of varlist var_code1-var_code50 {
                 replace abc_`j'=1 if `var'=="`j'"
               }
            }
            }

            Comment

            Working...
            X