Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Appending and merging multiple temp files

    Dear all,

    I'm currently at a stumbling block on a Stata code I'm working on. This involves seven different variables with years from 1999 - 2016 and each year is saved as a temp file. Below is code for the years 1999 and 2000 for one variable, to show you how it's structured;

    Code:
    *importing the matching worksheet in excel file for 1999*
    import excel "$data/1999", clear sheet("I.A.2") cellrange(A2:B53) firstrow
    
    *Create year FIPSCode variable*
    gen FIPSCode=_n
    label var FIPSCode "FIPSCode"
    gen Year = 1999
    label var Year "Year"
    rename JobSearchRequired JobSrchRq
    
    /*Replacing the empty cells with "n/a" and rewriting the state D.C. as DC for
    consistency across years*/
    replace JobSrchRq = "N/A" if JobSrchRq =="*"
    replace State = "DC" if State=="D.C."
    
    *Rearranging the order of data set *
    order FIPSCode State Year JobSrchRq
    
    /*Save as a temporary dta file; the format will be temp'year ''a', where'a'
    represents the variable Mandatory Job Search at Application variable.*/
    sort FIPSCode Year
    tempfile temp1999a
    save `temp1999a'
    
    
    *importing the matching worksheet in excel file for 2000*
    import excel "$data/2000", clear sheet("I.A.2") cellrange(A2:B53) firstrow
    
    *Renaming Job Search Required variable*
    rename JobSearchRequired JobSrchRq
    
    *Rewriting the State D.C. as DC for consistency across years.*
    browse
    replace State = "DC" if State=="D.C."
    
    /*Removing super scripts which import as numbers attached to the content in the
    Job Search Required field*/
    tab JobSrchRq
    replace JobSrchRq = "Yes" if substr(JobSrchRq,1,1)=="Y"
    replace JobSrchRq = "No" if substr(JobSrchRq,1,1)=="N"
    
    *Check to make sure super scripts were actually taken out*
    tab JobSrchRq
    
    *Create year FIPSCode variable*
    gen FIPSCode=_n
    label var FIPSCode "FIPSCode"
    gen Year = 2000
    label var Year "Year"
    
    *Rearranging the order of data set *
    order FIPSCode State Year JobSrchRq
    
    /*Save as a temporary dta file; the format will be temp'year ''a', where'a'
    represents the variable Mandatory Job Search at Application variable.*/
    sort FIPSCode Year
    tempfile temp2000a
    save `temp2000a'
    I will need help appending all the years for each variable and finally merging all the variables into one huge data file.
    Last edited by David Amankwah; 26 Aug 2018, 22:49.

  • #2
    So the hard part is already done. You have code to handle each year's file from 1999 through 2016. To append them all together is quite easy:

    Code:
    clear
    tempfile building
    save `building', emptyok
    forvalues i = 1999/2016 {
        append using `temp`i'a'
        save `"`building'"', replace
    }
    At the end of that code, the temporary file `building' will contain all 18 years' files appended together. You can then put "finishing touches" on it, if necessary, and save it as a regular .dta file.

    I don't know what you mean by "finally merging all the variables into one huge data file." The files you have created all contain the same variables, and there is nothing else to merge in. What am I missing here?

    Comment


    • #3
      Thank you for your timely response. In terms of finally merging all the variables into one huge data file, I mean there are seven variables with which each variable has data for years 1999 to 2016. So after appending the temp files with each year, I have to merge all the appended seven variables into one. Also, when I run the code provided for appending, I get the error "invalid file specification".
      Last edited by David Amankwah; 27 Aug 2018, 00:21.

      Comment


      • #4
        I'm guessing the error you get arises from running the code shown in #2 separately from the code that created the temporary files shown in #1. Temporary files are named with local macros. And local macros go out of scope at the end of the program that creates them. In this context, "program" means a formal Stata program, or a do-file run as a whole, or a segment of a do-file run separately. In the case of tempfiles, not only does the reference to them through a local macro expire, but the file itself is deleted. So you have to re-run all of the code that #1 is an excerpt of, and the code in #2 in one fell swoop.

        Concerning the merging of the appended files, it's fairly straightforward. Let's assume that you have saved each of the appended files and named them file_a.dta, file_b.dta, file_c.dta,... I don't know how many of them there are, but you get the idea. I also assume that within each of the appended files, observations are uniquely identified by the combination of FIPScode and year. (If that is not the case, then below you must replace FIPScode year in the -merge- command by whatever group of variables is necessary to uniquely identify observations.) Then merging them all together is similar to the code in #2:

        Code:
        local to_merge: dir "." files "file_*.dta"
        
        local starter: word 1 of `to_merge'
        local to_merge: subinstr local to_merge "`starter'" ""
        
        use `starter', clear
        foreach f of local to_merge {
            display `"merging `f'"'
            merge 1:1 FIPScode year using `f'
            tab _merge
            drop _merge
        }
        At the end of this code all of the files will have been merged into a single file in active memory, which you can then, again "put finishing touches on," if appropriate, and save as a permanent file.

        Note: The first command in this file presumes that the files to be merged together are in the current working directory. If that is not the case you can either use the -cd- command to change the current working directory to the place where those files are, or you can change the "." in the -local to_merge- command to whatever the appropriate directory is. If you choose the latter route, then you must also change the -user `starter', clear- and -merge 1:1 FIPScode year using `f'- commands to refer to the full pathname of the files. (For these reasons, I think it is simpler to -cd- to the directory where the files are.)

        When the code as run, it behooves you to review the output of the -merge- commands to make sure that things have come together in the way you expect them to. When a large number of data sources are being put together, inconsistencies among them often turn up and they don't fit together properly, additional data cleaning being needed first.


        Comment


        • #5
          Thanks for such a detailed response. The code for the appending the years for the variables worked and you're right; it wasn't working previously because I was running the code separately. However, when I run the code for merging the appended variables, and it gets to the line:

          Code:
           
           local to_merge: subinstr local to_merge "`starter'" ""
          I get the error "invalid syntax". Thanks for all the help again.

          Comment


          • #6
            So, I set up a bunch of files named file_a.dta, file_b.dta, ... in a directory and ran the code. It ran with no error messages.

            However, when I then renamed those files so that they did not match the pattern file_*.dta, then I was able to reproduce your error.

            So, my hypothesis is that you don't actually have files with names that match file_*.dta in your working directory. It is possible that you need to change the current working directory to whatever directory you saved the appended files in.

            Another possibility is that when you created the appended files you saved them with names that don't match the pattern file_*.dta. Do recall that I prefaced that code in #4 with "Let's assume that you have saved each of the appended files and named them file_a.dta, file_b.dta, file_c.dta..." Perhaps you didn't actually use those names to save the appended files under. If you used names that don't match the pattern file_*.dta, then you will get this error. The solution is to change the reference to "file_*.dta" in the first command of the code in #4 to some appropriate wildcard pattern that properly expresses the names of those files.

            Comment


            • #7
              Thanks again for your response. I fixed that error based on your suggestion to change the pattern in the line of code to match the way I named the appended files. Also, I went ahead and changed the working directory to where the appended files are but I was a bit unclear with regards to this part of your previous post with regards to the file merging code;

              "Note: The first command in this file presumes that the files to be merged together are in the current working directory. If that is not the case you can either use the -cd- command to change the current working directory to the place where those files are, or you can change the "." in the -local to_merge- command to whatever the appropriate directory is. If you choose the latter route, then you must also change the -user `starter', clear- and -merge 1:1 FIPScode year using `f'- commands to refer to the full pathname of the files. (For these reasons, I think it is simpler to -cd- to the directory where the files are.)"

              To clarify, I took the latter route in your note and not clear on the second half of that route and some clarification will be much appreciated. Thanks.

              Comment


              • #8
                With most commands that do file-handling, if you specify only a bare filename without a path, Stata (and your OS) will assume that the intended file is in the current working directory, and will not look for it elsewhere. (There are a few exceptions, but let's ignore those; none of them will arise in what you are doing here.) Consequently, if you want to work with a file called `f' but it is not in the current working directory, you have to refer to it by the combination of its pathname and filename. So, if, for example, you need to -use-, or -merge-, or -append- files in directory "C:/Users/Amankwah/Foobar", but your current working directory is elsewhere, then:

                Code:
                // THESE WILL FAIL
                use `"`f'"'
                append using `"`f'"'
                merge ... using `"`f'"'
                
                // THESE WILL WORK
                use `"C:/Users/Amankwah/Foobar/`f'"'
                append using `"C:/Users/Amankwah/Foobar/`f'"'
                merge ... using `"C:/Users/Amankwah/Foobar/`f'"'
                The same will apply to the file `starter'.

                If you find it too tedious and error-prone to repeatedly type out a long pathname (I would), you can use a local macro with a shorter name to abbreviate it:

                Code:
                local mypath C:/Users/Amankwah/Foobar
                
                use `"`mypath'/`f'"'
                // ETC.
                Important note: Even if you are using Windows (for which the native path separator is \) you should use / as your path separator in Stata. Stata for Windows will change all those to \ before sending them to windows. The reason for avoiding \ in path names in Stata is that the \ character can interfere with Stata's interpretation of local macros, so that, for example,
                Code:
                use `"`mypath'\`f'"'
                will fail because Stata will not understand the reference to local macro f following a \ character.


                Comment


                • #9
                  Thanks, Clyde! This perfectly solves my problem.
                  Thanks again

                  Comment


                  • #10
                    dear all
                    i am done saving tempfiles of eight modules but i have problems with merging the tempfiles into a single dataset. please help

                    Comment


                    • #11
                      Chifundo Luwayo show me the code you used please.

                      Comment


                      • #12
                        *MERGING DATA FILES use `consumption', clear merge 1:1 case_id using `module_e this is the code it is showing

                        Comment


                        • #13
                          *MERGING DATA FILES use `consumption', clear merge 1:1 case_id using `module_e this is the code i am using

                          Comment


                          • #14
                            Dear all i am trying to run a correlation analysis and a regression analysis in stata but it is giving me an error saying no observations when i run them, below are the stata commands; *Correlation analysis* sureg (haz Loan_access LOCATION HH_Food_Consumed emplo_head child_illness none_hh plsc_hh secondary_hh postsec_hh hhsize age_n_months SEXChild)(waz Loan_access LOCATION HH_Food_Consumed emplo_head none_hh plsc_hh secondary_hh postsec_hh child_illness hhsize age_n_months SEXChild), corr *Regression analysis* sureg (haz Loan_access LOCATION HH_Food_Consumed emplo_head child_illness none_hh plsc_hh secondary_hh postsec_hh hhsize age_n_months SEXChild)(waz Loan_access LOCATION HH_Food_Consumed emplo_head none_hh plsc_hh secondary_hh postsec_hh child_illness hhsize age_n_months SEXChild) your help will be greatly appreciated

                            Comment


                            • #15
                              Chifundo Luwayo Please give me an example of your data using dataex as the FAQ asks you to do. I can't help you if I don't have any of your data.

                              Comment

                              Working...
                              X