Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging datasets using a loop with macros

    Hi all,

    I am new to Stata and I am trying to use loops to merge pairs of datasets for 4 countries. In other words, I intend to merge year1 data with year2 data for country1, year1 data with year2 data for country2, etc. I have two subfolders named "year_(x)", x being either 1 or 2, and each of them gathers one folder for each of my countries of study. This folder contains only one .dta file that is not necessarily the same across countries. Here is my code:

    Code:
    global path "F:/Research"
    local countries Guinea Togo Mozambique Zambia
    
    foreach country of local countries {
    
    local masterfile: dir "$path/year_1/`country'/" files "*.dta" // If I understood correctly, this command stores in a macro the names of the .dta files contained in the folder "`country'". As a reminder, there is only one dataset per folder "`country'".
    display `filelist'
    cd "$path/year_1/`country'"
    
    use "./`masterfile'", clear
    
    local usingfile: dir "$path/year_2/`country'/" files "*.dta"
    cd "$path/year_2/`country'"
    
    merge 1:m id using "./`country'/`usingfile"
    
    cd "$path/outputs"
    save `country'_merged
     }
    Unfortunately, when I try to run this code I get the following output:

    Code:
    invalid 'guinea'
    r(198);
    I suspect this error is linked with the -use- command line. When I add a pair of brackets between `masterfile', as follows:

    Code:
    use "./"`masterfile'"", clear
    I get the following output:

    Code:
    invalid '"guinea-2016-full-data.dta'
    r(198);
    Which is the name of my file for Guinea.

    Could someone help me with advices on how to run this code properly? Many thanks for your time and precious help!
    Last edited by Julia Simon; 04 Apr 2022, 18:47.

  • #2
    Carefully compare the two lines in red below to see what I changed from in the code copied from post #1. In the first line, you needed quotation marks around the local macro so that display would treat it as a string to display rather than as a variable name. In the second line, you omitted the ' after the local macro usingfile.

    Code:
    global path "F:/Research"
    local countries Guinea Togo Mozambique Zambia
    
    foreach country of local countries {
    
    local masterfile: dir "$path/year_1/`country'/" files "*.dta"
    display "`masterfile'"
    cd "$path/year_1/`country'"
    
    use "./`masterfile'", clear
    
    local usingfile: dir "$path/year_2/`country'/" files "*.dta"
    cd "$path/year_2/`country'"
    
    merge 1:m id using "./`country'/`usingfile'"
    
    cd "$path/outputs"
    save `country'_merged
     }
    In this you have unnecessary quotation marks around the local macro.
    Code:
    use "./"`masterfile'"", clear
    which should read exactly as it appeared in the code above.

    Comment


    • #3
      William: Thank you for the correction! It was definitely needed and I wasn't aware of the importance of using appropriate quotation.

      I think I found how to solve my problem. My code actually needed two other loops, one for the local masterfile and one for the local usingfile. Here's the code that yielded expected results:

      Code:
      global path "F:/Research"
      local countries Guinea Togo Mozambique Zambia  
      
      foreach country of local countries {  
      
      local masterfile: dir "$path/year_1/`country'/" files "*.dta"
      display "`masterfile'"        
      
      foreach m of local masterfile {      
      cd "$path/year_1/`country'"      
      use "./`m'", clear            
      
      local usingfile: dir "$path/year_2/`country'/" files "*.dta"            
      foreach u of local usingfile {            
      cd "$path/year_2/`country'"            
      merge 1:m id using "./`country'/`u'"  
      
      cd "$path/outputs"
      save `country'_merged  
      }
      }
      }
      However, I have to admit that I found this fruitful code after thousands of random editings and I still do not understand the logic behind it. Could someone give me more details about the difference between code in #1 and the code written above?

      Many thanks!
      Last edited by Julia Simon; 04 Apr 2022, 20:29.

      Comment


      • #4
        The only way for the loops to make a difference is if one of the 8 directories you are reading your data from has more than 1 ,dta file in it, despite your assertion to the contrary in post #1.

        Run the following and review the output carefully. Each macro listed should contain just one .dta filename.
        Code:
        global path "F:/Research"
        local countries Guinea Togo Mozambique Zambia  
        foreach country of local countries {  
        display "`country'"
        local masterfile: dir "$path/year_1/`country'/" files "*.dta"
        local usingfile: dir "$path/year_2/`country'/" files "*.dta"            
        macro list _masterfile  _usingfile      
        }
        Last edited by William Lisowski; 05 Apr 2022, 07:59.

        Comment

        Working...
        X