Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming datasets by a variable value using loop

    I have a lot of datasets named 1.dta, 2.dta, 3.dta...40.dta. My goal is to rename each by the values of a variable 'countryCode' so that I know which country the dataset is for.

    Here is my wilde guess, but it's not working:

    Code:
     forvalues i = 1/20 {
        use `i'.dta, clear 
        local country = v000 
        rename `i'.dta `country'.dta   
        clear `i'.dta    
    }
    Please let me know if there is a way to perform this.

    Thanks in advance!

  • #2
    I see two problems here:
    • you need to set the value of the local macro country to a specific observation of the variable v000. If every observation takes the same value, you could do local country = v000[1]
    • the command rename is for renaming variables, not files. And clear is to remove certain objects from memory, not erasing files. You could do
    Code:
    save `country'.dta, replace
    erase `i'.dta

    Comment


    • #3
      Originally posted by Hemanshu Kumar View Post
      I see two problems here:
      • you need to set the value of the local macro country to a specific observation of the variable v000. If every observation takes the same value, you could do local country = v000[1]
      • the command rename is for renaming variables, not files. And clear is to remove certain objects from memory, not erasing files. You could do
      Code:
      save `country'.dta, replace
      erase `i'.dta
      Hi Hemanshu, thanks for much for the help!

      Comment


      • #4
        Here's another way that should be more efficient. I assume you have a data set containing 40 observations: the country codes in order from 1 to 40. I assume the country code variable is named v000
        Code:
        use country_codes_data_set, clear
        forvalues i = 1/40 {
            !rename `i'.dta "`=v00[`i']'.dta"
        }
        The ! in front of the rename command tells Stata that what follows is not a Stata command but a command to be passed to and executed by your operating system. Thus !rename does rename data sets, not variables. The main advantage of this code is that there is no need to read a data set in and then save it again when all you really want to do is change its name, not its contents.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Here's another way that should be more efficient. I assume you have a data set containing 40 observations: the country codes in order from 1 to 40. I assume the country code variable is named v000
          Code:
          use country_codes_data_set, clear
          forvalues i = 1/40 {
          !rename `i'.dta "`=v00[`i']'.dta"
          }
          The ! in front of the rename command tells Stata that what follows is not a Stata command but a command to be passed to and executed by your operating system. Thus !rename does rename data sets, not variables. The main advantage of this code is that there is no need to read a data set in and then save it again when all you really want to do is change its name, not its contents.
          Thanks so much Clyde! It looks like my question was badly phrased 'My goal is to rename each by the values of a variable 'countryCode' so that I know which country the dataset is for.'

          So I'd like to rephrase it. There are ~40 datasets in the directory, and I want to create a loop that will: 1) open a dataset, 2) tabulate the country variable (tab country), 3) rename the dataset by the output of the tab command, and so on. Please let me know if this makes the question a little clearer.



          Comment


          • #6
            There are ~40 datasets in the directory, and I want to create a loop that will: 1) open a dataset, 2) tabulate the country variable (tab country), 3) rename the dataset by the output of the tab command, and so on. Please let me know if this makes the question a little clearer.
            It is clear that what you have and want are different from what I (mis)understood. It remains a little unclear exactly what you do want. You refer to tabulating the country variable. Then yo want to rename the dataset by the output of the tab command. So how does that work if there is more than one country mentioned in the output of the country variable? Or does that never happen in these data set? If that never happens, and country is really a constant in the data set, then basically I would do what was advised by Hemanshu Kumar in #2, but using !rename instead of -save- and -erase-:
            Code:
            forvalues i  1/40 {
                use `i'.dta, clear
                local country = v00[1]
                assert v00 == `"`country'"'
                !rename `i'.dta "`country'.dta"
            }
            Note: The -assert- statement in the middle of the loop verifies that all of the values of v00 are the same. If not, the program will stop at that point with an error message. If v00 can name more than one country in the same data set, you need to decide how you want to name such a data set. Within reason, whatever you decide about that can be implemented in code.

            Added: It dawns on me that it is somewhat wasteful to read in the entire file when all you need is the v00 variable. If your data sets are small this won't make a noticeable difference. But if they are large, you could change -use `i'.dta, clear- to -use v00 using `i'.dta, clear- and it will be run faster.
            Last edited by Clyde Schechter; 09 Jul 2023, 14:20.

            Comment

            Working...
            X