Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using foreach command to create a new variable

    Hi,
    I am new to Stata. I am trying to use the foreach command to go over a list of variables (5) to create a new variable with value of 1 if any of the 40 variables contains a specific string variable (I12 for example)
    This is the code I am using, but it only goes over the first variable and returns the results without going over the rest of the variables, what am I doing wrong here?

    foreach i of varlist (variable1 - variable5) {
    gen kin=1 if `i'==3,
    }

    Thank you for your help

  • #2
    Hi Michael,

    I would remove the brackets and also the comma:

    Code:
    foreach i of varlist variable1-variable5  {
    gen kin=1 if `i'==3
    }
    Best,
    Rhys

    Comment


    • #3
      Thank you so much Rhys. I have removed the brackets and the comma. The loop works but only only goes through the first variable and none of the remaining 4 variables. I am not sure what I am doing wrong.
      This is the new code I am using


      foreach i of varlist variable1-variable5 {
      gen kin=1 if `i'==3
      }

      Comment


      • #4
        Odd... it looks okay to me. Do you get an error message or something?
        Perhaps you could expand your varlist manually (to see if that works) and use:

        Code:
        foreach i of varlist variable1 variable 2 variable 3 variable4 variable5 {
        gen kin=1 if `i'==3
        }

        Comment


        • #5
          Thank you so much. I tried it again but still not working. It only goes over variable 1. Is there a reason why a new variable can only be generated based on the conditions in the first variable and not the others?

          Comment


          • #6
            The main problem with #1 is repeated in #2, #3 and #4. Second time around the loop kin already exists and so can't be generated again.

            You don't need a loop here.

            Code:
            gen kin = inlist(3, variable1, variable2, variable3, variable4, variable5)
            or

            Code:
            egen kin = anymatch(variable1-variable5), values(3)


            In #1 the opening is about string variables, but the example is about numeric variables.

            Comment


            • #7
              Thank you so much Nick, what about if variables 1-5 contain string variables not numeric variables. Would egen work?

              Comment


              • #8
                The reason I wanted to create a loop is that I want to run multiple conditions to be met in the variables, for example I want kin=1 if any of the variables contain values 3 or 4 for example.
                would the correct code be
                gen kin = inlist(3|4, variable1, variable2, variable3, variable4, variable5) ?

                Comment


                • #9
                  No. The help for egen explains. The argument to values() must be one or more integers.

                  To get good advice on a string version, please give concrete data examples and the code you tried. See #12 in https://www.statalist.org/forums/help

                  Comment


                  • #10
                    Thank you Nick. Here is what I am trying to do. I have variables I10_DX1 to I10_DX40 (40 variables) which contain different string variables for admission diagnoses in a large database.
                    I want to create a variable called NSTEMI and give it value of 1 if the admission diagnosis "Z888" or "I124" or "GH321" are met in any of the 40 variables.

                    Here is the initial code I used

                    foreach i of varlist I10_DX1 - I10_DX40{
                    gen nstemi=1 if `i'== "K7031"| `i'=="Z888"
                    }

                    Now I understand it does not work because nstemi will only be generated once.
                    How can I overcome this?
                    Thank you so much for your help.

                    Comment


                    • #11

                      Code:
                      gen nstemi = 0 
                      
                      foreach v of varlist DX1-l10_DX40 { 
                             replace nstemi = 1 if inlist(`v', "K7031", "Z888") 
                      }

                      Comment


                      • #12
                        This worked perfectly, thank you so much Nick.

                        Comment


                        • #13
                          Dear all,
                          A follow-up question here.
                          I am wondering how to modify this code so that it can import the criteria for the variable "nstemi" and other variables we intend to generate by going through varlist DX1-DX40 from another dataset:
                          Code:
                          gen nstemi =0
                          foreach v of varlist DX1-DX40 {
                              replace nstemi =1 if inlist (`v',"K7031", "Z888")
                          }
                          In this code we have to input criteria like ("K7031", "Z888") to every variable we want to generate. If we have a lot of variables that need to be generated by going through varlist DX1-DX40, we have to write this loop for a lot of times, each loop for a different variable and its criteria (a string like ("K7031", "Z888")).
                          What if we have a dataset like the following that, for example, listed all the variables that we want to generate and their corresponding criteria:
                          Code:
                          * Example generated by -dataex-. To install: ssc install dataex
                          clear
                          input str6 var str15 criteria
                          "nstemi" `""K7031", "Z888""'
                          "var1"   `""O432""'         
                          "var2"   `""K590""'         
                          "var3"   `""O432","K590""'  
                          end
                          Would it be possible to integrate this criteria dataset with the foreach loop so that it can generate all the variables listed in the criteria dataset? Sort of like a loop within a loop?
                          I have been reading on Stata programming and about storing something in the LOCAL of Stata, but I don't know how to store the information in the criteria dataset and how to use it in the foreach loop so that it can generate multiple new variables?

                          Thanks very much for your help!

                          Ginny

                          Comment


                          • #14
                            I would not do it this way with locals, but here you go.

                            Code:
                            * Example generated by -dataex-. To install: ssc install dataex
                            clear
                            input str6 var str15 criteria
                            "nstemi" `""K7031", "Z888""'
                            "var1"   `""O432""'         
                            "var2"   `""K590""'         
                            "var3"   `""O432","K590""'  
                            end
                            l
                            local total= _N
                            gen all= var +" " + criteria
                            forval i=1/`=_N'{
                                local c`i'= all[`i']
                            }
                            
                            * Example generated by -dataex-. For more info, type help dataex
                            clear
                            input str5(DX1 DX2 DX3) 
                            "K7031" "BC112" "K590"
                            "VRDX2" "O432" "K101"
                            "V2KL"  "K590"  "M111"
                            end
                            
                            forval i=1/`total'{
                                gen `=word("`c`i''", 1)' = 0
                                foreach v of varlist DX1-DX3{
                                    replace `=word("`c`i''", 1)' = inlist(`v', `=substr(trim(`"`c`i''"'), length("`=word("`c`i''", 1)'")+1, .)') if `=word("`c`i''", 1)'==0
                                }
                            }
                            l
                            Res.:

                            Code:
                            . l
                            
                                 +--------------------------+
                                 |    var          criteria |
                                 |--------------------------|
                              1. | nstemi   "K7031", "Z888" |
                              2. |   var1            "O432" |
                              3. |   var2            "K590" |
                              4. |   var3     "O432","K590" |
                                 +--------------------------+
                            
                            . l
                            
                                 +----------------------------------------------------+
                                 |   DX1     DX2    DX3   nstemi   var1   var2   var3 |
                                 |----------------------------------------------------|
                              1. | K7031   BC112   K590        1      0      1      1 |
                              2. | VRDX2    O432   K101        0      1      0      1 |
                              3. |  V2KL    K590   M111        0      0      1      1 |
                                 +----------------------------------------------------+

                            Comment


                            • #15
                              Thanks Prof. Musau! The code you suggested worked nicely and it really saved the trouble of writing the foreach loop for every variable.

                              It would be great if you can kindly help on the 2 further questions:

                              1. May I ask how would you do it since you said you would not do it this way? I don't really have a preference on coding approach. The premise is that I had 2 datasets, first with the diagnosis codes (DX1-DX40), second with the variables and their criteria. And the question is how to generate variables in the first dataset by going through their criteria in DX1-DX40. Would it be possible for the Stata program to just import the var and criteria from an outside file (e.g. a csv?) and use it in the foreach loop?

                              2. Is there a way to insert a bypassing mechanism in the loop coding? Because in the foreach loop, it would search DX1-DX40 for the criteria, but maybe for some observations the generated variables would be replaced to 1 according to DX1 or DX2, thus rendering the search in DX3 to DX40 redundant for these observations. If we can bypass the search for these observations, would the program be more efficient (especially when we have a huge amount of observations)?


                              Thanks very much!
                              Ginny

                              Comment

                              Working...
                              X