Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loops within loops using foreach and local

    I'm having trouble with a loop. I'm working on medical data and I want to create a flag for a category of diagnoses. This diagnosis category has 2,000+ codes (alpha-numeric strings) associated with it that I have store in a separate excel file. I want to bring these in and use them in a loop as part of my replace function which, itself, is in a loop. But i get an error telling me that the first diagnosis code is invalid. I assume this is a problem with how I am nesting the loops, or even something simple as how string variables are used here. Here is the malfunctioning code:

    levelsof codes, local(levels)
    gen category = .
    foreach x of varlist diagnosiscode2-diagnosiscode15{
    foreach y of local levels
    replace category = 1 if `x' = `y'
    }
    }

    Any ideas on how to build this properly?

    -LM
    Last edited by Lyden Marcellot; 11 Sep 2017, 15:27. Reason: foreach, loop, local, levelsof, replace

  • #2
    You should report the actual error message that Stata gave you. I'm sure Stata is not telling you that your diagnosis code is invalid: Stata doesn't know what diagnosis codes are and has no opinion about their validity. It's telling you that something in your syntax is invalid.

    Without seeing it, here's my best guess. I'm guessing that these diagnosis codes, both the ones indexed by `x' and those indexed by `y' are strings not numbers. If that's the case, the problem is that you need to bind `y' within quotes. (You do not need to bind x in quotes, because I'm assuming that you are testing whether the value of one of those diagnosiscode* variables is equal to the string contained in `y'. Thus:

    Code:
    levelsof codes, local(levels)
    gen category = .
    foreach x of varlist diagnosiscode2-diagnosiscode15{
        foreach y of local levels
            replace category = 1 if `x' = `"`y'"'
        }
    }
    That said, I see that you are generating category as a 1/. variable. This is an accident waiting to happen. In Stata, when you have a dichotomy, the safe approach is to code it as 1/0, using . only for observations where it is unknown whether 0 or 1 applies.

    All of that said, you have your diagnosiscodes scattered among at least 14 variables in wide layout. It is hard to make a judgment about this without seeing the layout of the other variables, but there is a good chance that you would be better off -reshape-ing your data to long. If so, you would be able to dispense with the outer loop altogether. And I can imagine further modifications to your data structure that would eliminate that loop as well. But this is starting to get speculative, as you do not show your example data here.


    Comment


    • #3
      In addition to Clyde's helpful suggestions note that testing for equality always implies use of == not =.

      Comment


      • #4
        Yes, good point, Nick. My oversight.

        Comment


        • #5
          All great comments guys. I was able to get it to work with your comments. Thanks!

          Comment


          • #6
            Hello everyone,

            whilst googling my current coding-issue, I stumbled upon this thread and thought I'd give my question a try. Maybe some experienced user can help me:

            I'm trying to estpost results for four values of "vignettes" (1 2 3 4) and three values of "year" (1996 2006 2018) in an efficient way. To check my logic I tried:

            Code:
            levelsof vignettes, local(aux4)
            foreach i of local aux4{
                foreach j in 1996 2006 2018 {
                    display `i'`j'
            }    
            }
            As expected, this yields the following output:

            Code:
            . levelsof vignettes, local(aux4)
            1 2 3 4
            
            . foreach i of local aux4{
              2.         foreach j in 1996 2006 2018 {
              3.                 display `i'`j'
              4. }       
              5. }
            11996
            12006
            12018
            21996
            ...
            However, when I then go to try:

            Code:
            levelsof vignettes, local(aux4)
            foreach i of local aux4{
                foreach j in 1996 2006 2018 {
                    eststo `i'`j' : estpost summarize rec_* if vignettes == `i' & year == `j'
            }    
            }
            I am shown:

            Code:
            . levelsof vignettes, local(aux4)
            1 2 3 4
            
            . foreach i of local aux4{
              2.         foreach j in 1996 2006 2018 {
              3.                 eststo `i'`j' : estpost summarize rec_* if vig
            > nettes == `i' & year == `j'
              4. }       
              5. }
            11996 invalid name
            r(198);
            How can I change this? Is there a workaround that allows for an integrated naming? What does "invalid name" mean in this context and why is it shown?
            Any help is much appreciated. Thanks!

            Comment


            • #7
              I don't think this is anything to do with loops. There are rules for what counts as a name in Stata. Names can't be all numeric characters. I suspect that an initial letter say m would be enough to solve the problem.

              Here are some extracts from [U] 11.3

              A name is a sequence of 1 to 32 letters (A–Z, a–z, and any Unicode letter), digits (0–9), and underscores ( ).

              The first character of a name must be a letter or an underscore (macro names are an exception; they may also begin with a digit). We recommend, however, that you not begin your variable names
              with an underscore. All of Stata’s built-in variables begin with an underscore, and we reserve the right to incorporate new variables freely.

              Stata respects case; that is, myvar, Myvar, and MYVAR are three distinct names.

              Most objects in Stata—not just variables—follow this naming convention.

              Comment


              • #8
                The material between -eststo- and the colon ( must be a valid name for an estimation set. Valid names for estimation sets cannot be simply numbers. They are subject to the same constraints as variable names: the first character must be a letter or underscore (_) character. So change -eststo `i'`j' :- to something like -eststo r`i'`j' :-. (It doesn't have to be r, that's just a suggestion because it evokes "results." But it can be any letter(s) or underscore.)

                Added: Crossed with #2.

                Comment


                • #9
                  Thank you, Nick. Thank you, Clyde. I don't know how I missed that. Now I'll remember.

                  Comment

                  Working...
                  X