Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to repeat commands over multiple data files

    Hello folks, I am working on a project which requires me to grab multiple data files (from 2001 to 2016, each year has a separate data file). The workflow is:

    run commands using each data file,
    then save the final data,
    Open another data file,
    run commands and save the final data.

    Ideally I want to append all the final data from each data files in one big data file. To make myself clear, below is the command that I have to run for the first (2001) data file:

    Code:
    use "/Volumes/MY DRIVE/dct_s2001_f.dta", clear
    
    keep if arank >=1 & arank <= 5
    
    egen fptot = rowtotal(staff15 staff16) if arank == 1
    egen asptot = rowtotal(staff15 staff16) if arank == 2
    egen aptot = rowtotal(staff15 staff16) if arank == 3
    
    rename fptot tot1
    rename asptot tot2
    rename aptot tot3
    
    forval i = 1/3{
        gen pct_`i'_nral_m = staff01/tot`i'
        gen pct_`i'_nral_f = staff02/tot`i'
        
        gen pct_`i'_bl_m = staff03/tot`i'
        gen pct_`i'_bl_f = staff04/tot`i'
        
        gen pct_`i'_ap_m = staff07/tot`i'
        gen pct_`i'_ap_f = staff08/tot`i'
        
        gen pct_`i'_hs_m = staff09/tot`i'
        gen pct_`i'_hs_f = staff10/tot`i'
        
        gen pct_`i'_wh_m = staff11/tot`i'
        gen pct_`i'_wh_f = staff12/tot`i'
    }
    
    collapse (last) staff01 staff02 staff03 staff04 staff07 staff08 staff09 staff10 staff11 staff12 staff15 staff16 ///
             (firstnm) pct_1_nral_m pct_1_nral_f pct_1_bl_m pct_1_bl_f pct_1_ap_m pct_1_ap_f pct_1_hs_m pct_1_hs_f pct_1_wh_m pct_1_wh_f ///
                       pct_2_nral_m pct_2_nral_f pct_2_bl_m pct_2_bl_f pct_2_ap_m pct_2_ap_f pct_2_hs_m pct_2_hs_f pct_2_wh_m pct_2_wh_f ///
                       pct_3_nral_m pct_3_nral_f pct_3_bl_m pct_3_bl_f pct_3_ap_m pct_3_ap_f pct_3_hs_m pct_3_hs_f pct_3_wh_m pct_3_wh_f ///
                       pct_aapi_fp pct_aapi_ap ///
                       , by(unitid)
    By now, I have to save the final data into a new separate data file. Then, I need to open the next file (2002) and repeat all the above commands. Instead of copying and pasting codes for each year, is there a faster way I can achieve this? I never used global or macro before and so if anyone could give me some advice on how to do this, that'd be very appreciated!!


  • #2
    You set this up as a program and then call the program with changing values.

    Alternatively, it looks like you could put this in a loop:
    forvalues y=2001/2005 {
    use "/Volumes/MY DRIVE/dct_s`Y'_f.dta", clear
    ...ALL YOUR REMAINING CODE ... THEN AFTER COLLAPSE
    save "/Volumes/MY DRIVE/collapse`Y'_f.dta", clear
    }
    having created all your datasets, you can then append or merge them. This can also be done in a loop or just a few merge statements.

    Comment


    • #3
      Note, the Y in the forvalues must be upper case if the Y later is upper case.

      Comment

      Working...
      X