Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Taking strings out of file names

    I scraped together some xls files using the almighty Python. Before I can/want to work with these files I must rename them, because this code
    Code:
    foreach f of loc files {
        
    import delim "`f'", clear
    
    sa "`f'", replace
    }
    ruins the files.

    This code gives me

    Code:
    cls
    
    
    local files : dir "`c(pwd)'" files "*Dados*"
    
    foreach f of loc files {
        
    di "`f'"
    
    }
    dadosbo_2018_1(homicÍdio doloso).xls
    dadosbo_2018_10(homicÍdio doloso).xls
    dadosbo_2018_11(homicÍdio doloso).xls
    dadosbo_2018_12(homicÍdio doloso).xls
    dadosbo_2018_2(homicÍdio doloso).xls
    dadosbo_2018_3(homicÍdio doloso).xls
    dadosbo_2018_4(homicÍdio doloso).xls
    dadosbo_2018_5(homicÍdio doloso).xls
    dadosbo_2018_6(homicÍdio doloso).xls
    dadosbo_2018_7(homicÍdio doloso).xls
    dadosbo_2018_8(homicÍdio doloso).xls
    dadosbo_2018_9(homicÍdio doloso).xls
    What I want, is to mass rename all of these by removing "dadosbo_" and "(homicÍdio doloso)" from the file names. I know how to rename them individually, but how could I only remove some strings from these file names?

  • #2
    One thing confuses me. These are .xls files but you are trying to import them with -import delim-? That's not right. I'll assume you meant -import excel-. I wold do this as

    Code:
    forvalues i = 1/12 {
        import excel dadosbo_2018_`i'(homicÍdio doloso).xls, clear // perhaps other options
        save 2018_`i', replace
    }

    Comment


    • #3
      When I try import exc, Stata tells me "It is not as xlsx file. Use the xlCreateBook() for an xls file." No idea what this means or even what it refers to, but Import delim however works on my end. For an example, when I try this code block, I get a 603 error.
      Code:
      import exc "E:\Papers\Brazil\Raw Data\DadosBO_2019_7(HOMICÍDIO DOLOSO)", clear
      file E:\Papers\Brazil\Raw Data/DadosBO_2019_7(HOMICÍDIO DOLOSO).xls could not be loaded
      You don't have to of course, but you can go to the link, click on "homicidios dolosos", and download a random file. Never seen this error before.


      Either way though, I'll have files for multiple years and months. Python's still grabbing them, but I'll have files which are named things like "dadosbo_2019_9(homicÍdio doloso).xls", "dadosbo_2021_2(homicÍdio doloso).xls". So, I would either need to use a double loop for the years too, or rename the files and save them. To me, the cleanest solution is to rename them, but I'll use the double loop presuming there's no easy way to get rid of the unneeded strings.

      Comment


      • #4
        Let's share more of the details for the benefit of others. When you import your xls-suffixed file as text Stata tells us

        Code:
        . import delimited "/Users/lisowskiw/Downloads/DadosBO_2019_7(HOMICÍDIO DOLOSO).xls"
        (encoding automatically selected: UTF-16LE)
        (54 vars, 709 obs)
        (Note that the non-ASCII character in the filename is not rendered correctly in the forum software,)

        The file is not in fact in Excel XLS format - it is a text delimited file encoded in UTF-16LE with a meaningless suffix of xls.

        Now, in post #1 you tell us your code ruins the files. What exactly does that mean?

        Stata saves the imported Stata dataset with the specified xls suffix rather than a dta suffix, and thus overwrites the xls input file.

        You need to strip off the .xls from the end of the filename before saving it.

        Code:
        . local f "/Users/lisowskiw/Downloads/DadosBO_2019_7(HOMICÍDIO DOLOSO).xls"
        
        . import delimited "`f'", clear
        (encoding automatically selected: UTF-16LE)
        (54 vars, 709 obs)
        
        . local o : subinstr local f ".xls" ""
        
        . save "`o'", replace
        (file /Users/lisowskiw/Downloads/DadosBO_2019_7(HOMICÍDIO DOLOSO).dta not found)
        file /Users/lisowskiw/Downloads/DadosBO_2019_7(HOMICÍDIO DOLOSO).dta saved
        
        .
        And the xls file is left intact.

        The same macro subinstr function will allow you to remove other components of the input file names when creating the output file names.

        Comment


        • #5
          I ended up doing
          Code:
          foreach x of loc files {
          loc a: di strlen("`x'")
          
          if `a' == 37 {
              
          local subfile = substr("`x'", 9, 14) + ".xls"
          
              !rename "`x'" "`subfile'"
          }
          
          else if `a'==38 {
              
          local subfile = substr("`x'", 9, 15) + ".xls"
          
              !rename "`x'" "`subfile'"
              
              
          }
          }
          It still keeps the "(HOMICÍDIO" string, but it's an improvement from before. Although I'm sure there's a better way to do this.

          Comment


          • #6
            Here is a more robust way to handle the file renaming within Stata, rather than say using the shell to handle this (which in Linux and mac would be maybe more efficient, but I'm not aware of a clean command line-based solution for Windows).

            The code presupposes that you have one root directory where your files are located. Inside this directory are two key folders: one to hold the existing (old) files, and one the renamed (new) files. The old file names are first pattern matched using regex and new file names derived based on the pattern (plus whatever other processing you desire). The the files are copied to the new file directory with the new name. Optionally, you may erase the old file.

            Here I adopt the unimaginative, but easy, pattern of renaming files with the pattern old_*.txt to new_*.csv

            Code:
            cd "/path/to/your/file/root"
            local oldfiles : dir "./old/" files "*.txt" , respectcase
            mac list _oldfiles
            
            foreach fn of local oldfiles {
              local oldfn `"`fn'"'
              local res = ustrregexm(`"`oldfn'"', "old_(.*)\.txt", .)
              if `res'==1 {
                local newfn = "new_" + ustrregexs(1) + ".csv"
                di `"Renaming `oldfn' to `newfn'"'
                copy `"./old/`oldfn'"' `"./new/`newfn'"', public
                * erase `"./old/`oldfn'"'  // CAUTION: optionally, erase the old files
              }
            }
            After this point, you can go on to import each file into Stata using -import delimited-.
            Last edited by Leonardo Guizzetti; 27 Mar 2022, 21:23.

            Comment


            • #7
              The solution I ended up using was
              Code:
              local files : dir "`c(pwd)'" files "*dadosbo_*"
              cls
              
              foreach x of loc files {
              local subfile = subinstr("`x'", "(homicÍdio doloso)", "", .)
              
                  !ren "`x'" "`subfile'"
              }
              
              local files : dir "`c(pwd)'" files "*dadosbo_*"
              cls
              foreach x of loc files {
              local subfile = subinstr("`x'", "dadosbo_", "", .)
              
                  !ren "`x'" "`subfile'"
              }
              It's sort of unclean, but it does the job. Presumably, there's a more elegant solution which exists. I imagine I could simplify it by using the ustrregexs function..

              Comment

              Working...
              X