Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with the letters æ. ø, å

    Hello,

    I have tried to open some of my old .dta-files containing varnames and labels with the letters æ, ø, å. They are in my current STATA/IC 15.1 replaced with a �, and in those cases where the letters are in the variable names, I am unable to access them.

    It says that unicode is supported, and I have tried to following commands - with no good result

    clear
    cd "F:\Personlige\Ø Celia\DFS-undersøgelsen"
    unicode encoding set Windows-1252
    unicode analyze *
    unicode translate *


    It is continuesly filled with ���, and i am still unable to edit or use the specific variables.
    I would love some advice! I seem to have reach a deadend.

    Best regards,
    Celia Skaarup

  • #2
    FAQ 12.1 What did Stata respond?

    Comment


    • #3
      Unfortunately, it is a matter of guessing the right encoding. Have you also tried latin1 (ISO-8859-1)?

      Comment


      • #4
        I have also tried latin1.

        STATA responds by saying r198. Error in syntax, varname, etc... See below

        Click image for larger version

Name:	Udklip.PNG
Views:	2
Size:	13.2 KB
ID:	1433171
        Attached Files

        Comment


        • #5
          Please email a copy of the dataset to [email protected]. If the dataset is large, a few observations with letters æ. ø, å will suffice. Be sure it is the original dataset though, not the dataset you already run -unicode translate- on. You might need run -unicode restore- to restore the dataset to its original state.

          Comment


          • #6
            Celia,

            you wrote:
            I have tried to open some of my old .dta-files containing varnames ... with the letters æ, ø, å.
            How is this possible? That would be an illegal variable name in older Statas. Is "old" here referring to post Stata-14 era? Or something before the Unicode was introduced? If so, which tool did you use to create those files?

            Best, Sergiy

            Comment


            • #7
              Sergiy's expectation match the definition of a Stata variable name in version 12.1:

              A name is a sequence of one to 32 letters (A–Z and a–z), digits (0–9), and underscores ( ).
              However, at least some characters of the upper ANSI has been accepted in variable names, but not as the first character.

              I have seen examples of "old" dta files created by other packages using varnames where also the first character is one of æøå. If I remember correct, after loading such a dta file into Stata, even if the variable name look like any other in describe etc., any operations on the variable fail, even rename, but mata could be used to change the illegal name.

              I suspect Celia's original files are using Windows-1252, and use this encoding below:
              Using Stata/MP 12.1 Revision 23 Jan 2014:
              Code:
              . clear
              . set obs 3
               
              . foreach W1252 of numlist 230 248  229  {
                      
              .         gen var_`=char(`W1252')' = char(`W1252')      
              .         lab var var_`=char(`W1252')' "var_`=char(`W1252')'"        
              .         capt noi gen `=char(`W1252')'_var = .   /* FAIL */
                }
              
              æ_var invalid name
              ø_var invalid name
              å_var invalid name
              
              . des
              
              Contains data
                obs:             3                          
               vars:             3                          
               size:             9                          
              -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                            storage  display     value
              variable name   type   format      label      variable label
              -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              var_æ           str1   %9s                    var_æ
              var_ø           str1   %9s                    var_ø
              var_å           str1   %9s                    var_å
              -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
              
              . list
              
                   +-----------------------+
                   | var_æ   var_ø   var_å |
                   |-----------------------|
                1. |     æ       ø       å |
                2. |     æ       ø       å |
                3. |     æ       ø       å |
                   +-----------------------+
              
              . save æøå.dta


              Then, using Stata/MP 15.1 Revision 08 Mar 2018:
              Code:
              . dtaversion  æøå.dta
                (file "æøå.dta" is .dta-format 115 from Stata 12)
              Converting to UTF-8:

              Code:
              unicode encoding set Windows-1252
              unicode translate æøå.dta
              Code:
              . use æøå.dta
              
              . d
              
              Contains data from æøå.dta
                obs:             3                          
               vars:             3                          10 Mar 2018 18:19
               size:            18                          
              -------------------------------------------------------------------------------------------------------------------------------------------------
                            storage   display    value
              variable name   type    format     label      variable label
              -------------------------------------------------------------------------------------------------------------------------------------------------
              var_æ           str2    %9s                   var_æ
              var_ø           str2    %9s                   var_ø
              var_å           str2    %9s                   var_å
              -------------------------------------------------------------------------------------------------------------------------------------------------
              Sorted by:
              
              . l
              
                   +-----------------------+
                   | var_æ   var_ø   var_å |
                   |-----------------------|
                1. |     æ       ø       å |
                2. |     æ       ø       å |
                3. |     æ       ø       å |
                   +-----------------------+
              Last edited by Bjarte Aagnes; 10 Mar 2018, 12:08.

              Comment

              Working...
              X