Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loop over list of files and append but first item in list is empty

    Hello,

    I am trying to obtain a list of files matching a wildcard expression and append them all together. The list of files is long, so I would like to use a loop.
    I have figured out two ways of getting macro with a list of files:
    Code:
    local filelist: dir dirname files "*pattern*.dta" //1
    ssc install fs
    fs dirname/*pattern*.dta //2
    The first produces a local named filelist and the second produces a local r(files). I can print the entire list using --di ""`filelist'""--

    I can print the list of files one by one successfully:
    Code:
    .foreach file in ""`filelist'"" {
    2. di "`file'"
    3. }
    
    file1.dta
    file2.dta
    file3.dta
    }
    However, I cannot use --append--, and I tried using quotes and no quotes around `file':
    Code:
    . clear
    . foreach file in  ""`filelist'"" {
      2. append using ""`file'""
      3. }
    file  not found
    r(601);
    
    . foreach file in  ""`filelist'"" {
      2. append using `file'
      3. }
    invalid file specification
    r(198);
    What am I doing wrong? I suspect that the first entry in `filelist' is actually an empty string, but I don't know how to get rid of this. For now I have found a workaround using
    Code:
    if "`file'" != "" {
    append using "`file'"
    }
    inside the loop, but this seems rather inelegant.

  • #2
    I think your problem is with the use of two sets of double-quotes. When Stata sees "", it thinks you mean an empty string. So the first element of the loop becomes the null string, and you have no file with that name. This causes no problem with the display command (the first line of the display is just empty), but it does create a problem with append, when it actually looks for a file whose name is just the empty string.

    Try this instead:
    Code:
    foreach file in `"`filelist'"' {
        append using `"`file'"'
    }
    Last edited by Hemanshu Kumar; 19 Sep 2022, 09:59.

    Comment


    • #3
      Hi Hemanshu, I tried using single quotes, but the problem persists:

      Code:
      . foreach file in "`filelist'" {
      2. di "`file'"
      3. }
      
      file1.dta
      file2.dta
      file3.dta
      
      .
      So the problem of nulls still exists. Moreover, when I use -set trace on-, it actually seems that Stata is inserting extra quotes into the files which are stripped out by -di-. Unfortunately, this continues to cause headaches for me.
      Using your solution, the macro puts all the files into a single string, which strips out the empty results but is not what I want:

      Code:
      . foreach file in `"filelist'"' {
      2. di `"`file'"'
      3. }
      
      "file1.dta" "file2.dta" "file3.dta"
      .

      Comment


      • #4
        Using no quotes at all seems to have worked, which is exceedingly strange given that usually Stata will interpret that as a reference to a variable:
        Code:
        . foreach file in `filelist' {
        2. di "`file'"
        3. }
        file1.dta
        file2.dta
        file3.dta
        
        .
        I swear, it seems like Stata has 2459874508435 different hiccups when it comes to strings.

        Comment


        • #5
          There is a difference between a string as a variable and string as a macro.

          Comment


          • #6
            A couple of asides:

            1. You don't need a loop to append all these files together. If you have verified that they are suitable for append (see 2. below), just a single -append using `filelist'- command will do the job. And if you want to keep track of which file each observation in the result came from, -append- has a -generate- option for that purpose.

            2. That said, mass -append- often ends in tears. Even if your data sets are coming from reliable sources that curate their data sets carefully and expertly, if the data sets number more than a handful, the probability is high that there will be incompatibilities among them that result in a malformed result or appreciable loss of data when you mass append them (whether through a single command as I suggested above, or through a loop). What should be the same variable may have slightly different names in different data sets (differences in case are common, as are digit reversals, and spelling errors are hardly rare). Or two variables with the same name in different data sets may be string in one data set and numeric in another. Or the variable may be numeric in both but have different value labels.

            I highly recommend that you install Mark Chatfield's -precombine-, available from SSC, and use it to screen your data sets for these problems ahead of time. It will point out any potential problems, and then you can fix those problems before you run -append-. I can tell you from painful experience that this preventive approach is far more effective and efficient than trying to clean up the grotesque and defective dataset that can result from just blindly going ahead with mass -append-. Worse still, the results of mass -append- can be seriously wrong in ways that are not obvious, and you may not realize you are working with incorrect data until far down the line, necessitating a lot of rework,or possibly not even until you have presented your results to somebody who will take actions relying on them.

            Comment

            Working...
            X