Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to write a double foreach loop

    Hi everyone,

    I had a question about writing a double loop. Below is my data for reference. In my data, I have a student identifier called id. I have a variable called sem which indicates the semester that a student first enrolled in school. The variable sem has four possible values. A 5 represents the Fall semester. A 2 represents the Spring semester. A 3 represents Summer Session 1. A four represents Summer Session 2. The other variable is called year and it represents the calendar year that the student enrolled in the school.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id sem year)
     1 5 2011
     2 2 2012
     3 3 2012
     4 4 2012
     5 5 2012
     6 2 2013
     7 3 2013
     8 4 2013
     9 5 2013
    10 2 2014
    11 3 2014
    12 4 2014
    13 5 2014
    14 2 2015
    15 3 2015
    16 4 2015
    17 5 2015
    18 2 2016
    19 3 2016
    20 4 2016
    21 5 2016
    22 2 2017
    23 3 2017
    24 4 2017
    25 5 2017
    26 2 2018
    27 3 2018
    28 4 2018
    29 5 2018
    30 2 2019
    31 3 2019
    32 4 2019
    33 5 2019
    34 2 2020
    35 3 2020
    36 4 2020
    end
    I am trying to identify the academic year that a student enrolled in the school. An academic year runs from Fall to Summer Session 2. Thus a student who enrolled in the first academic year may have enrolled in either:
    1. Fall 2011
    2. Spring 2012
    3. Summer Session 1 2012
    4. Summer Session 2 2012
    For individuals enrolled in the Fall semester of a particular calendar year, I can successfully identify the academic year that they enrolled using the following loop:


    Code:
    *Generate the academic year variable
    gen academic_year_enroll=.
    
    *Create for all Fall semesters from 2011-2019
    *i indexes for both academic year and calendar year since they are the same
    foreach i of num 1/9 {
        replace academic_year_enroll = `i' if year==201`i' & sem==5
        }

    For individuals enrolled in the Spring semester of a particular calendar year, I cannot successfully identify the academic year that they enrolled in. I am using the following loop:


    Code:
    *All Spring semesters from 2012-2020
    *i should index the academic year
    *j should index the calendar year
    *sem is equal to 2 because I am looking for the Spring semester.
    foreach i of num 1/9 {
        foreach j of num 12/20 {
        replace cademic_year_enroll =`i' if year==20`j' & sem==2
            }
    }
    Instead, I manually wrote out what I wanted
    Code:
    replace academic_year_enroll = 1 if year==2012 & sem==2
    replace academic_year_enroll = 2 if year==2013 & sem==2
    replace academic_year_enroll = 3 if year==2014 & sem==2
    replace academic_year_enroll = 4 if year==2015 & sem==2
    replace academic_year_enroll = 5 if year==2016 & sem==2
    replace academic_year_enroll = 6 if year==2017 & sem==2
    replace academic_year_enroll = 7 if year==2018 & sem==2
    replace academic_year_enroll = 8 if year==2019 & sem==2
    replace academic_year_enroll = 9 if year==2020 & sem==2
    Can I ask someone for help in fixing my loop? Manually writing it out invites mistakes and I want to learn how to address this issue. I am just trying to improve my coding and feedback is highly appreciated. Thanks in advance

  • #2
    No loops needed. Thank you for the excellent example data.

    So what we have is that if sem==5 the academic year is the calendar year minus 2010. And if sem is 2, 3, or 4 the academic year is the calendar year minus 2011.
    Code:
    . generate academic_year_enroll = year - cond(sem==5,2010,2011)
    
    . // demonstrate results (requires Stata 17)
    . table (academic_year_enroll year) (sem), nototal
    
    -------------------------------------
                         |       sem    
                         |  2   3   4   5
    ---------------------+---------------
    academic_year_enroll |              
      1                  |              
        year             |              
          2011           |              1
          2012           |  1   1   1    
      2                  |              
        year             |              
          2012           |              1
          2013           |  1   1   1    
      3                  |              
        year             |              
          2013           |              1
          2014           |  1   1   1    
      4                  |              
        year             |              
          2014           |              1
          2015           |  1   1   1    
      5                  |              
        year             |              
          2015           |              1
          2016           |  1   1   1    
      6                  |              
        year             |              
          2016           |              1
          2017           |  1   1   1    
      7                  |              
        year             |              
          2017           |              1
          2018           |  1   1   1    
      8                  |              
        year             |              
          2018           |              1
          2019           |  1   1   1    
      9                  |              
        year             |              
          2019           |              1
          2020           |  1   1   1    
    -------------------------------------
    You may find it useful to consult
    Code:
    help cond()
    to learn the details of the cond() function.
    Last edited by William Lisowski; 23 Jan 2022, 15:26.

    Comment


    • #3
      You're making this way more complicated than it needs to be. No loops are needed here. Moreover, it seems you are really not looking to calculate the academic year in which the person enrolled: that would be a number like 2011. You are instead trying to generate some sort of sequential academic year starting with 2011 as sequential academic year 1
      Code:
      gen academic_year = cond(sem == 5, year, year -1)
      gen sequential_academic_year = academic_year - 2010
      Finally, if you are going to eventually have to do things like sort these in chronological order, you will need a sequential semester variable that starts with Fall and then increases through Spring, Summer 1, and Summer 2. You can get that with:
      Code:
      recode sem (5 = 1), gen(sequential_semester)
      Then if you need to sort the data chronologically you can use sequential_academic_year sequential_semester as the sort key.

      Added: Crossed with #2 whose solution is essentially the same as mine.


      Comment


      • #4
        @William and @Clyde. Thank you so much for your suggestions. Both work like a charm. Using the cond() function was a better strategy. I will try to incorporate this function into my toolkit.

        While I am still on this coding issue, is there a way to write a for- loop to accomplish what I want? I don't want to lose out on a teachable moment. In the code below, I want the for- loop to pass through i and j in parallel. I would really appreciate your suggestion. Thanks.

        Code:
         
         *All Spring semesters from 2012-2020 *i should index the academic year *j should index the calendar year *sem is equal to 2 because I am looking for the Spring semester. foreach i of num 1/9 {     foreach j of num 12/20 {     replace cademic_year_enroll =`i' if year==20`j' & sem==2         } }

        Comment


        • #5
          You don't do parallel loops in Stata. Actually, I don't know of any programming language that does parallel loops. You do a single loop with two parallel indices:

          Code:
          forvalues i = 1/9 {
              local j = `i' + 11
              // do something with `i' and `j'
          }

          Comment


          • #6
            Thank you Clyde for your response. Knowing that it is not possible to run parallel loops is helpful for how I want to think about coding. I may turn to the cond() function more often than the for- loop. Thanks for your help.

            Comment


            • #7
              I think I may have figured out a way to run a parallel loop. I used the following thread and the fourth comment which is from Nick Cox.

              https://www.statalist.org/forums/for...ts-in-parallel


              I use the gettoken command. My code is below. It definitely is not as concise as Clyde or William's feedback but if you want to run a parallel loop, this may be one option.

              Code:
              *Spring 2022
              *Spring semester sem==2
              local agrp "1 2 3 4 5 6 7 8 9"
              local bgrp "12 13 14 15 16 17 18 19 20"
              foreach a of local agrp{
                  gettoken b bgrp : bgrp
                  replace academic_year_enroll = `a' if year==20`b' & sem==2
                  }

              Comment


              • #8
                In my experience, looping over two lists is INSANELY useful, I pretty much always have some use for it or another, especially in data management.

                However, this doesn't seem to be one of those cases. If the parallel list loop does what you want, then okay, but if you can use cond, qbys, or similar constructs to get what you'd like, I'd go with that.

                Comment


                • #9
                  I wrote about parallel loops in 2003. I won't give the reference here as that paper has been superseded by a paper just published, notionally in 2021:

                  See https://journals.sagepub.com/doi/pdf...6867X211063415

                  That said, when people ask for parallel loops they don't always need them. The code in #7 can be rewritten more simply. A first step is to see that you can go


                  Code:
                  forval b = 12/20 { 
                     replace academic_year_enroll = `b' - 11 if year==20`b' & sem == 2 
                  }
                  and then you notice that this is just

                  Code:
                  replace academic_year_enroll = year - 2000 - 11 if sem == 2
                  or indeed

                  Code:
                  replace academic_year_enroll = year - 2011 if sem == 2
                  with no loops whatsoever (except the loop over observations implied by replace). Naturally, if you went straight there that's good.

                  There's one warning. If for some reason the code really shouldn't apply to years other than 2012/2020 then you need the extra condition

                  Code:
                  & inrange(year, 2012, 2020)
                  I think there's an overarching small problem here. People often come to Stata with some experience in programming in other languages. That's fine: such people have a headstart over people with no programming experience whatsoever. But sometimes what's natural in other languages is not needed or not ideal in Stata.

                  Note that this is a variation on the theme in #2 and #3 with contributions from Clyde Schechter and William Lisowski

                  Comment


                  • #10
                    Hi Nick. Thank you so much for your feedback. Similar to what Clyde, William and Jared said, I didn't' need the parallel loop. There is that saying, when you have a hammer, everything looks like a nail. Seeing how you, Clyde, and William dissected my question and found an intuitive solution without using a loop is a good lesson in how I want to think about coding issues. I appreciate this community for offering the space to ask questions and to receive helpful suggestions.

                    Thank you for referring to the recent Stata Journal article about gettoken. I will definitely read it. Thank you.

                    Comment

                    Working...
                    X