Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Changing letters to numbers (not integers)

    I'm pretty new to Stata and have a basic idea on programming (only a bit of python). I know how to do what I need in Excel but the data I'm dealing with now is more than 30'000'000 observations so cannot export to excel.



    my observation looks like this (let's call the var: letters):
    a
    a
    c
    b
    c

    and I have another table with two columns that shows (this table is now in excel - my first problem is how to use it in stata) :
    a=5
    b=3
    c=10

    I want to make a new variable (let's call this var: numbers) that at the end of operation will look like that:
    5
    5
    10
    3
    10

    Please help me!
    Thank you very much!



  • #2
    Take a look at the encode command.
    Jorge Eduardo Pérez Pérez
    www.jorgeperezperez.com

    Comment


    • #3
      Encode will assign the codes of its own. There is nothing in the original email that would suggest that a=1, b=2, and c=3. In fact the example shown contradicts this assumption.

      Do not use encode in this case.

      Code:
      clear
      input str1 s
      a
      a
      c
      b
      c
      end
      
      encode s, generate(x)
      list, nolabel
      
      
      
           +-------+
           | s   x |
           |-------|
        1. | a   1 |
        2. | a   1 |
        3. | c   3 |
        4. | b   2 |
        5. | c   3 |
           +-------+
      Merge with a dictionary was a good recommendation, but requires additional files. For a small number of values, I'd prefer generate/replace:
      Code:
      generate y=.
      replace y=5 if s=="a"
      replace y=3 if s=="b"
      replace y=10 if s=="c"
      There are other ways too.

      Best, Sergiy

      Comment


      • #4
        Originally posted by Sergiy Radyakin View Post
        Encode will assign the codes of its own.
        Unless you pass in a pre-defined label:
        Code:
        lab def numbers 3 "b" ///
                        5 "a" ///
                       10 "c"
        encode letters, gen(numbers) lab(numbers) noextend
        which is, in general, the easiest (and least error-prone) way to encode a variable according to a given value label. If you can get the value label from elsewhere without having to type it in manually, then all the better.

        Comment


        • #5
          encode with a previously generated value label will do the job. It will assign values based on the value label., instead of 1,2,3. I think I had to be more specific before. Just create a value label and then use the value label's name as the new variable name when you encode.

          Code:
          clear
          input str1 s
          a
          a
          c
          b
          c
          end
          label define snum 5 "a" 3 "b" 10 "c"
          encode s, gen(snum)
          Jorge Eduardo Pérez Pérez
          www.jorgeperezperez.com

          Comment


          • #6
            Another solution:

            Suppose your MS Excel file looks like

            Click image for larger version

Name:	xlsx_data.png
Views:	1
Size:	6.4 KB
ID:	284591

            Then try something like:

            Code:
            clear
            set more off
            
            *----- import MS Excel file and save -----
            
            import excel using "keyfile.xlsx"
            
            tempfile keyfile
            save "`keyfile'"
            
            *----- load file with literals -----
            
            clear
            
            input ///
            str1 A
            a
            a
            c
            b
            c
            end
            
            *----- merge with keys in MS Excel data -----
            
            merge m:1 A using "`keyfile'", assert(match) nogen
            
            list
            
            *----- add value labels if desired -----
            
            // install with -ssc describe labutil-
            labmask B, values(A)
            
            list
            list, nolabel
            The order of the observations is modified but it is easy to get it back to original form. Say so if you need this.
            You should:

            1. Read the FAQ carefully.

            2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

            3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

            4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

            Comment

            Working...
            X