Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • For Loop Using Macros and Levelsof

    Hi all,

    Background

    I am trying to run a for loop to automate the following process. I have panel data on university applicants, and I'm trying to determine the following information for each of the individual universities:
    • Number of applicants to each individual university
    • Number of applicants that received an offer to each individual university
    • Each university's offer rate (number of applicants that received offers/number of applicants that applied)
    • The acceptance rate of students that receive an offer to each university (number of applicants that attended said university/number of applicants that received offers to said university)
    So far, I've been able to build out the code to do this for one university at a time.

    Data Explanation

    Prospective students can apply to up to 10 schools, with their first choice being stored in the variable UniversityChoice1 and their tenth choice being stored in UniversityChoice10. I have data on whether or not they received an offer from all of the schools they applied to, with whether or not they received an offer to their first choice stored in the variable OfferToChoice1 and whether or not they received an offer to their tenth choice stored in the variable OfferToChoice10. I also have data on which university's offer they accepted, stored in the variable RegisteredUniversity.

    (UniversityACode) denotes the school code for University A in the UniversityChoiceX variable. So, if UniversityChoice1 == (UniversityACode), it means that that University A was the applicants first choice of school.

    I've run the code through with several schools and I am 100% sure that it returns the proper values. The code reads as follows:

    Code:
    *Tabulate Applicants To University A
    gen AppliedToUniversityA = 1 if UniversityChoice1 == (UniversityACode) |  UniversityChoice2 == (UniversityACode) |  UniversityChoice3 == (UniversityACode) |  UniversityChoice4 == (UniversityACode) |  UniversityChoice5 == (UniversityACode) |  UniversityChoice6 == (UniversityACode) | UniversityChoice7 == (UniversityACode) |  UniversityChoice8 == (UniversityACode) |  UniversityChoice9 == (UniversityACode) | UniversityChoice10 == (UniversityACode)
    recode AppliedToUniversityA (.=0)
    tab AppliedToUniversityA
    
    //we observe that 400,000 people applied to university A
    
    *Tabulate Those That Received An Offer To University A
    gen OfferToUniversityA = 1 if UniversityChoice1 == (UniversityACode) & OfferToChoice1 == 1 |  UniversityChoice2 == (UniversityACode) & OfferToChoice2 == 1 |  UniversityChoice3 == (UniversityACode) & OfferToChoice3 == 1 |  UniversityChoice4 == (UniversityACode) & OfferToChoice4 == 1 |  UniversityChoice5 == (UniversityACode) & OfferToChoice5 == 1 |  UniversityChoice6 == (UniversityACode) & OfferToChoice6 == 1 | UniversityChoice7 == (UniversityACode) & OfferToChoice1 == 7 |  UniversityChoice8 == (UniversityACode) & OfferToChoice8 == 1 |  UniversityChoice9 == (UniversityACode) & OfferToChoice9 == 1 | UniversityChoice10 == (UniversityACode) & OfferToChoice10 == 1
    recode OfferToUniversityA (.=0)
    tab OfferToUniversityA
    
    //we observe that 100,000 people received an offer to university A
    
    *Tabulate Offer Rate Of University A
    gen UniversityAOfferRate = (OfferToUniversityA/AppliedToUniversityA)
    tab UniversityAOfferRate
    
    //we observe an offer rate of 25%
    
    *Tabulate Acceptance Rate Of Those Who Receive An Offer
    gen WentToUniversityA = 1 if RegisteredUniversity = (UniversityACode)
    recode WentToUniversityA (.=0)
    gen UniversityAAcceptanceRate = (WentToUniversityA/OfferToUniversityA)
    tab UniversityAAcceptanceRate
    
    //we observe an acceptance rate of x%
    My Question

    Given that there are a huge number of schools, how can I automate this process with a for loop?

    I've tried to use levelsof with the RegisteredUniversity variable to create a local macro containing all of the different universities, but I've had no luck storing them by label instead of values and then using foreach to replicate the first chunk of code (*Tabulate Applicants To University A). My attempt went as follows:

    Code:
    levelsof RegisteredUniversity, local(university)
    local lbe : value label RegisteredUniversity
    
    foreach 1 of local university {
         `f1' : label `lbe' `1'
    }
    
    di `f1'
    
    foreach i in 1-10 {
         gen AppliedTo`university'A = 1 if UniversityChoice`i'  == 1
    }
    I understand this is a lengthy question, but any help or guidance would be much appreciated. I am happy to provide any clarity if needed.

    Cheers,
    Kevin

  • #2
    Let's first suggest some ways to make your code more concise. Each pair of generate and recode can be collapsed to a generate statement and further slimming down could make use of inlist() and local macros. I've also added some spaces and removed some unnecessary parentheses.

    That process exposed one bug from your point of view, I think, and one from Stata's point of view.

    Code:
    local UC UniversityChoice 
    local OTC OfferToChoice 
    
    * Tabulate Applicants To University A
    gen AppliedToUniversityA = inlist(UniversityACode, `UC'1, `UC'2, `UC'3, `UC'4, `UC'5, `UC'6, `UC'7, `UC'8, `UC'9, `UC'10)
    tab AppliedToUniversityA
    
    * Tabulate Those That Received An Offer To University A
    * !!! note that OfferToChoice1 == 7 looks like a typo 
    gen OfferToUniversityA = <simplified code using same ideas and local macro OTC> 
    tab OfferToUniversityA
    
    * Tabulate Offer Rate Of University A
    gen UniversityAOfferRate = OfferToUniversityA / AppliedToUniversityA
    tab UniversityAOfferRate
    
    *Tabulate Acceptance Rate Of Those Who Receive An Offer 
    
    * !!! note that RegisteredUniversity = (UniversityACode) is a bug 
    gen WentToUniversityA = RegisteredUniversity == UniversityACode
    gen UniversityAAcceptanceRate = WentToUniversityA / OfferToUniversityA
    tab UniversityAAcceptanceRate
    That being so, the claim that

    I am 100% sure that it returns the proper values
    looks puzzling. The code you show us wouldn't even complete. One statement is illegal, quite apart from the other bug.

    Your broader question is harder for me because I am not good at reading a lengthy abstract description of someone else's dataset and envisaging what the data look like.

    If you get a better answer, fine by me. Otherwise for once my advice is to give a fake and much simplified but realistic example with a few students, a few universities and so on. Our FAQ Advice, which presumably you read before posting, as requested, gives detailed advice on data examples.

    ​​​​

    Comment


    • #3
      I had the same reaction as Nick to your abstract description, and was just in the process or writing it up when his note appeared. I would strongly suggest a small example using -dataex-. I also don't know what you mean by "storing them by label instead of values," but I suspect there is some confusion there or that you are doing something in some difficult way, as I never have had occasion to store things using labels. My suspicion is that there will be a much simpler and much shorter way to solve your problem that involves a different approach than the road you currently are following, but doing that would require us having a concrete example to work with. I suspect that someone will have a quick solution for you if you can post that.

      Comment


      • #4
        Hi Nick and Mike,

        Thank you both for the quick reply! I apologize as this was my first post and I certainly should've been more detailed.

        I've built out a sample dataset in which there are ten students (observations). They can all apply to up to ten schools (UniversityChoice1-UniversityChoice10). I have data on whether or not they receive an offer to those schools (OfferToChoice1-OfferToChoice10). I also have data on which university they register at (RegisteredUniversity).

        The data (imported from dataex) looks as follows. Note that UniversityChoice7-UniversityChoice10 and OfferToChoice7-OfferToChoice10 are included in the dataset, but could not be included in dataex due to a limit in the number of variables one can include.

        Dataex Example

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte(UniversityChoice1 UniversityChoice2 UniversityChoice3 UniversityChoice4 UniversityChoice5 UniversityChoice6 OfferToChoice1 OfferToChoice2 OfferToChoice3 OfferToChoice4 OfferToChoice5 OfferToChoice6 RegisteredUniversity)
        1 3 5 8 . . 1 1 1 1 . . 5
        2 7 3 . . . 0 0 1 . . . 3
        1 2 7 . . . 1 1 1 . . . 7
        5 2 1 . . . 0 1 1 . . . 2
        5 1 2 3 . . 1 0 0 1 . . 5
        2 5 4 3 1 6 1 0 1 1 0 1 4
        6 5 1 . . . 0 1 1 . . . 5
        1 3 2 . . . 1 0 1 . . . 1
        6 2 5 4 . . 0 0 1 1 . . 5
        5 1 8 7 4 3 1 1 0 1 1 1 1
        end
        5 denotes the school code for University A in the UniversityChoiceX variables and the RegisteredUniversity variable. So, if UniversityChoice1 == 5, it means that that University A was the applicants first choice of school, and if RegisteredUniversity == 5, it means that the student was accepted to University A and accepted their offer there.

        As of right now, my code reads as follows. Note that I've been unable to successfully implement the <inlist> function when tabulating those that received an offer to University A.

        Code:
        local UC UniversityChoice 
        local OTC OfferToChoice 
        
        * 5 is code for University A
        
        * Tabulate Applicants To University A
        gen AppliedToUniversityA = inlist(5, `UC'1, `UC'2, `UC'3, `UC'4, `UC'5, `UC'6, `UC'7, `UC'8, `UC'9, `UC'10)
        tab AppliedToUniversityA
        // we observe that there are 7 applicants to University A
        
        * Tabulate Those That Received An Offer To University A
        gen OfferToUniversityA = 1 if UniversityChoice1 == 5 & OfferToChoice1 == 1 | UniversityChoice2 == 5 & OfferToChoice2 == 1 | UniversityChoice3 == 5 & OfferToChoice3 == 1 | UniversityChoice4 == 5 & OfferToChoice4 == 1 | UniversityChoice5 == 5 & OfferToChoice5 == 1 | UniversityChoice6 == 5 & OfferToChoice6 == 1 | UniversityChoice7 == 5 & OfferToChoice7 == 1 | UniversityChoice8 == 5 & OfferToChoice8 == 1 | UniversityChoice9 == 5 & OfferToChoice9 == 1 | UniversityChoice10 == 5 & OfferToChoice10 == 1
        recode OfferToUniversityA (.=0)
        tab OfferToUniversityA
        //we observe that 5 applicants received an offer to University A
        
        * Tabulate Offer Rate Of University A
        gen UniversityAOfferRate = OfferToUniversityA / AppliedToUniversityA
        tab UniversityAOfferRate
        //we observe an offer rate of 71.43%
        
        *Tabulate Acceptance Rate Of Those Who Receive An Offer 
        gen WentToUniversityA = RegisteredUniversity == 5
        gen UniversityAAcceptanceRate = WentToUniversityA / OfferToUniversityA
        tab UniversityAAcceptanceRate
        //we observe an acceptance rate by those who receive an offer of 80%
        My Question

        What I'm looking to do is to run a loop in which I can automate this process for University B, University C, etc.

        Again, thank you both for the quick response and help.

        Cheers,
        Kevin



        Comment


        • #5
          (Note to others: subject line does not match, as it referenced a technique rather than a problem.)

          I suppose loops might work, but I don't think that's an easy approach, as you have discovered. The key here is conceptual: The unit of observation/analysis should be the application, not the student. Give this a try and see if it works:


          Code:
          // Var names are cumbersome for me
          rename UniversityChoice*  Uapp*
          rename OfferToChoice* offer*
          // Restructure data so that the application is the unit of analysis.
          gen long id = _n  // example omitted student id
          reshape long Uapp offer, i(id) j(sequence)
          label var Uapp "Univ to which student applied"
          drop if missing(Uapp) // no one applied
          gen byte chosen = (Uapp == RegisteredUniversity)
          label var chosen "applicant chose this school"
          //
          // Collapse to university level
          collapse (count) Napp = sequence (sum) Noffer = offer Nchosen = chosen, ///
             by(Uapp)
          label var Napp "Number of apps received"
          label var Noffer "Number of offers made"
          label var Nchosen "Number of offers accepted"
          rename Uapp university
          list

          Comment


          • #6
            Hello Mike,

            This worked spectacularly for what I was looking for. Thanks for all of your help - it's much appreciated.

            Cheers,
            Kevin

            Comment

            Working...
            X