Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Handling duplicate variable names in .dta files

    I have a stata-format (.dta) file created by another program (HLM). Stata correctly opens the file and works appropriately EXCEPT the file produced by HLM has two variables named the same thing (PV1, from the column residuals files in HCM3 in case it's important).

    Within Stata, "PV1 ambiguous abbreviation" prevents any operations on these variables, including renaming. All I need to do is rename one of the variables so the name is not a duplicate. I've tried edit mode, but that just calls the command line. I've tried exporting to fix elsewhere, but get the same error. I suspect the solution involves renaming based on column number or something similar?

    This is obviously an HLM bug. But I'm hopeful I can make it work from within the Stata file (rather than having to stat transfer or do something else to get a valid Stata-format file), as all the other residuals files are exporting correctly and my models take hours to run. The data are not duplicates - i.e. each column has a distinct meaning with different values - and the order in which they are present in the data tells me that meaning. I've searched the forums to no avail.

    Thanks for whatever help you can provide.

  • #2
    Perhaps
    Code:
    set varabbrev off
    rename PV1 PV1A
    or
    Code:
    novarabbrev rename PV1 PV1A
    will cause Stata to omit the check for variable abbreviations and just rename the first variable named PV1 that it encounters in the dataset.

    Added in edit: if this advice doesn't help, what version of Stata are you using?
    Last edited by William Lisowski; 07 Apr 2022, 12:13.

    Comment


    • #3
      Thanks! I'm using Stata/IC 16.0

      When I do that, I get "variable PV1 not found". I'm wondering if there's a hidden character or embedded spaces or something else weird in the name? I've tried clicking the name off the variable list and cut & pasting the output from "ds", and get the same error.

      Comment


      • #4
        Try the following:

        Code:
        qui ds
        local varnames "`r(varlist)'"
        rename (*) var#, addnumber(1)
        Then copy the result of the following:

        Code:
        di "`varnames'"
        and manually change the duplicate variable names, and then

        Code:
        rename (*) (-PASTE MODIFIED LIST OF VARIABLE NAMES-)

        Comment


        • #5
          Andrew Musau -

          That's where I'm at, at the moment, and it doesn't work for me.

          I created a Stata dataset with two variables having different (but in my case visible) illegal characters at the ends of the names, which seems to be what the problem is in the original dataset. This is what happens.
          Code:
          . use gnxl, clear
          
          . ds
          P1A!  P1A#
          
          . local varnames "`r(varlist)'"
          
          . rename (*) var#, addnumber(1)
          P1A! invalid name
                    st_varrename():  3500  invalid Stata variable name
          perform_rename_straight():     -  function returned error
                 perform_renames():     -  function returned error
                  xeq_old_to_new():     -  function returned error
                    rename_cmd_u():     -  function returned error
                      rename_cmd():     -  function returned error
                           <istmt>:     -  function returned error
          r(3500);

          Comment


          • #6
            Here's an approach that works for me, using - as Rebecca Hinze-Pifer speculated in post #1 - the variable position in the dataset. The fact that Mata proved to be at the base of rename is what led me to look to Mata for a solution.
            Code:
            . use gnxl, clear
            
            . ds
            P1A!  P1A#
            
            . mata st_varrename(1,"P1A_1")
            
            . mata st_varrename(2,"P1A_2")
            
            . ds
            P1A_1  P1A_2
            Last edited by William Lisowski; 07 Apr 2022, 14:27.

            Comment


            • #7
              OK, then I would suggest:

              Code:
              export excel myfile,  firstrow(variables)
              followed by

              Code:
              import excel myfile, clear
              treating the variable names as observations. Then a loop of the sort

              Code:
              foreach var of varlist *{
                   rename `var' `=strtoname("`=`var'[1]'")'
              }
              drop in 1
              N.b. Crossed with #6 which should be more efficient.
              Last edited by Andrew Musau; 07 Apr 2022, 14:40.

              Comment


              • #8
                Apparently export excel wants to build a list of variable names. I hit similar deadends several times before getting to Mata.
                Code:
                . use gnxl, clear
                
                . ds
                P1A!  P1A#
                
                . capture noisily export excel gnxlx.xlsx, firstrow(variables) replace
                P1A ambiguous abbreviation
                
                . capture noisily novarabbrev export excel gnxlx.xlsx, firstrow(variables) replace
                variable P1A not found
                
                .
                Below I've attached gnxl.dta for anyone who wants a broken dataset to play with. Should have done that before, my apologies.

                gnxl.dta

                Comment


                • #9
                  Thanks to both of you for trying to run this down! When I try

                  Code:
                   qui ds
                  local varnames "`r(varlist)'"
                  rename (*) var#, addnumber(1)
                  I get:

                  "1 existing variable specified repeatedly in oldname
                  You requested PV1 be renamed to 2 different names: var7 and var8"
                  Last edited by Rebecca Hinze-Pifer; 07 Apr 2022, 15:17.

                  Comment


                  • #10
                    It appears that Stata expands "*" into a variable list without checking for invalid characters, and similarly when ds reports variable names. But, in either case when the result is parsed as a variable list, the parsing is interrupted by the invalid characters.

                    In any event, the technique in post #6 should be adaptable to the dataset described in post #1.

                    Comment


                    • #11
                      Below I've attached gnxl.dta for anyone who wants a broken dataset to play with.
                      Thanks. But what intrigues me even more is how you created gnxl.dta in the first place!

                      Comment


                      • #12
                        Well, for sure not by using HLM. :-)

                        On my MacBook in Stata I created gnxl.dta with two variables named x666 and y666. I opened gnxl.dta in the BBEdit programmer's editor and, in the metadata at the top of the file, found the two variable names and replaced each name with the invalid variable names and saved the result.

                        Comment


                        • #13
                          Originally posted by William Lisowski View Post
                          Here's an approach that works for me, using - as Rebecca Hinze-Pifer speculated in post #1 - the variable position in the dataset. The fact that Mata proved to be at the base of rename is what led me to look to Mata for a solution.
                          Code:
                          . use gnxl, clear
                          
                          . ds
                          P1A! P1A#
                          
                          . mata st_varrename(1,"P1A_1")
                          
                          . mata st_varrename(2,"P1A_2")
                          
                          . ds
                          P1A_1 P1A_2
                          Thanks William Lisowski - this was the successful solution!

                          Comment

                          Working...
                          X