Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Renaming variables according to position

    Good morning,

    Long text to avoid XY problem.

    I'm writing a Stata code that opens and conduct cleaning steps on excel files that will be sent weekly to me by a another person. Since the excel file the person sent as an example to me contains several sheets (which I would like to be appended). I decided on using xls2dta. Using an option on xls2dta, I created new variables that contain the information on the file path and sheet within the file for which a given observation was obtained. The sheet within the excel file is specially important, because information is contained on the name of the sheet. The exact xls2dta command I'm using is:
    Code:
    xls2dta , clear allsheets gen(file sheet) : append using "file_path\file_name.xls", import(firstrow allstring)
    The excel file also contain double headers and the first line of the header is a description of the excel file merged over several cells, which xls2dta reads as the variable name for the first variable, leaving the next variables as B, C, D and so on. The first observation of the the file already imported to Stata contains the real variable names. To solve that issue, I used
    Code:
    renvars, map(strtoname(@[1]))
    drop in 1
    which maps the content of the first observation and rename the variables according to that. It also gets rid of the first observation after mapping variable names. This works well, except that it also renames my "file" and "sheet" variables created using xls2dta.

    I see some possible approaches to solve the issue of having my file and sheet variables renamed:
    1. Rename those 2 variables using evocative names - don't want to go this way because different weekly excel files will have different names and different sheet names, so this would introduce a step of human work. Having something automatic is what I have in mind.
    2. I could limit the reach of renvars to not change the name of the last 2 variables of the dataset - the file and sheet variables are expected to always be the last 2 vars of the dataset. However, I don't know how to do that (or even if that's possible). This seems to be too much of a specific problem, which is not covered in the renvars help file.
    3. I could rename the 2 variables to their original name again without using their evocative name, but their position in the dataset (they are always expected to be the last 2 vars of the dataset). I found an interesting post on Statalist that approaches a similar problem, but the solution presented there involves changing variable names based on their numeric position, and I expect this to change over different excel files sent to me. I can not guarantee that file and sheet will always be variables number 11 and 12. A stronger assumption is that they will always be the last 2 variables, but even on the post mentioned above, I could not find information on how to code to change the name of the "last 2 variables" of a dataset.
    4. I could ask the person sending me the excel files to add a variable with the name of the sheet on each sheet of the excel file, but there is resistance for this change. It doesn't seem like it's going to happen.
    I appreciate any input you can give me. There are several steps that I'm doing, so I'm sharing snippets of the dataset in different stages of the problem:

    Immediately after xls2dta:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 AllSECONDFARMSAREESITIMATED str6(B C) str9 D str23 file str10 sheet
    "load time" "ORIGIN" "GROUP#" "# to load" "file_path\file_name.xls" "sheet_name"
    "7AM"       "A"      "X"      "10"        "file_path\file_name.xls" "sheet_name"
    "8AM"       "A"      "X"      "10"        "file_path\file_name.xls" "sheet_name"
    "6AM"       "B"      "Y"      "20"        "file_path\file_name.xls" "sheet_name"
    end
    Immediately after renvars (notice how my vars file and sheet got renamed):
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str9 load_time str6(ORIGIN GROUP_) str9 __to_load str23 file_path_file_name_xls str10 sheet_name
    "7AM" "A" "X" "10" "file_path\file_name.xls" "sheet_name"
    "8AM" "A" "X" "10" "file_path\file_name.xls" "sheet_name"
    "6AM" "B" "Y" "20" "file_path\file_name.xls" "sheet_name"
    end
    Hope this is clear, the post got way longer than anticipated.

    Best;

  • #2
    How about

    Code:
    ds file sheet , not
    renvars `r(varlist)' , map(strtoname(@[1]))
    Best
    Daniel

    Comment


    • #3
      The key issue is that Igor does not know the names of the last 2 variables.

      Code:
      qui ds *
      local lastvar: word `c(k)' of `r(varlist)'
      local secondlastvar: word `=`c(k)'-1' of `r(varlist)'
      
      replace `lastvar' = "`lastvar'" in 1
      replace `secondlastvar' = "`secondlastvar'" in 1
      
      renvars, map(strtoname(@[1]))
      drop in 1

      Comment


      • #4
        Originally posted by Romalpa Akzo View Post
        The key issue is that Igor does not know the names of the last 2 variables.
        I do not think so; those are specified in the xls2dta (SSC, by the way) call.

        Positional names are easier obtained with in Mata:

        Code:
        mata : st_varname((st_nvar(), st_nvar()-1))
        Best
        Daniel
        Last edited by daniel klein; 14 Feb 2019, 11:53.

        Comment


        • #5
          Hi, thanks for the inputs and I apologize for slightly confusing explanation of the problem.

          Before running the renvars command, the last 2 variables will always be named file and sheet (because that's what I ask xls2dta to name them, as Daniel pointed). The issue arises after the renvars command is issued, because the way I set it up, it changes the names of all variables. The name of the last 2 variables (file and sheet) changes according to the file and sheet within the excel file, all of which changes every week. Despite that, both solutions worked perfectly well (thanks again for that).

          If anyone is interested in the logic, Daniel's solution limits the reach of the renvars command to all variables but file and sheet using the command ds file sheet, not.

          Romalpa's solution stores in a macro the name of the last 2 variables (and that's why her code works regardless of the name of the last 2 variables) and then replace the value in the first observation for those names. After that, her code does the step I did (rename all vars according to the values in the first observation).

          Thank you both again, I highly appreciate both solutions. Hope this also helps other people.

          Best;

          Comment


          • #6
            Thanks Igor for the feedback. Since the relevant variables names (file and sheet) have been known at the very beginning, Daniel’s solution is the best one.

            Comment

            Working...
            X