Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to convert string id variable starting with "00" to numeric without losing the "00"?

    Hi, could any one help me with the following two issues?

    First, my id variable values start with 001, 002,... 011, 012, ... 099... and they are all coded as "string".
    I thought I should convert them into "numeric" (please say no if this is not what I am supposed to do) and once I converted them, all zeros are gone.
    How can I keep those zeros even when I convert them as numeric and is it really necessary or desirable to keep those zeros (since they are original id, so I think I should keep them)?

    And second question is:
    If some of group id variables start with alphabet (e.g. this household belongs to G35) as "string" variables, then can I just keep them as string or should I convert them something else that should work in Stata?

    Sorry for bombarding many questions.

    Thanks.

  • #2
    In general, if you're not going to do any calculations on the id variable (and if it's really just an id variable, usually one doesn't do that) then there is no real reason to convert them to numeric. The main reason I can think of for converting them to numeric is if you need to -xtset- your data with id as the panel (group) variable. -xtset- requires a numeric variable.

    Now, numeric variables neither have nor fail to have leading zeroes. It's just a matter of how you display them. If you run
    Code:
    destring id, replace format(%03.0f)
    and then -list- or -browse- the data you will see the id's listed as three digit numbers with leading zeroes. But the actual numbers that Stata is working with are just 1, 2, 3, 4, ...99, ... The leading zeroes are just a matter of how you choose to display the data.

    If id variables contain non-numeric characters, then, of course, you need to keep them as string. If you want to have a corresponding numeric variable to use with -xtset- you can get that with -encode- or -egen, group()-. But again, if it's really an id variable, I would leave it as a string unless you need to use it with -xtset-.

    Comment


    • #3
      Leading zeros in a numeric variable are a matter of display format (only).

      Code:
      destring id, gen(numid)
      would generate a new variable and you can specify a display format %03.0f. You can also check whether you lost information by (e.g)

      Code:
      isid numid
      not forgetting to specify a time variable too for panel or longitudinal data needing one.

      On the other hand, a variable with values like G35 sounds to be something you may want to keep as string.

      Comment


      • #4
        #2 by Clyde Schechter and #3 say very similar things really, but note that destring doesn't support a format() option. It could make sense as specifying the display format of the new variable(s) but it's not part of the syntax.

        It is tostring that supports a format() option as something it often needs.

        Comment


        • #5
          Nick, as usual, is right. My mistake.

          Comment


          • #6
            Thanks so much to both of you for excellent answers to my question.

            Comment


            • #7
              Though Clyde mentioned it shortly, I will zoom in on the following feature of -egen, group()-.

              If you start with data like this
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str3 id
              "003"
              "006"
              "099"
              end
              You can generate a nicely labelled numerical identifier like this
              Code:
              . egen numid = group(id), label
              
              . list
              
                   +-------------+
                   |  id   numid |
                   |-------------|
                1. | 003     003 |
                2. | 006     006 |
                3. | 099     099 |
                   +-------------+
              
              . tab numid
              
                group(id) |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                      003 |          1       33.33       33.33
                      006 |          1       33.33       66.67
                      099 |          1       33.33      100.00
              ------------+-----------------------------------
                    Total |          3      100.00
              
              . tab numid, nol
              
                group(id) |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        1 |          1       33.33       33.33
                        2 |          1       33.33       66.67
                        3 |          1       33.33      100.00
              ------------+-----------------------------------
                    Total |          3      100.00
              So your numerical id will be nicely labelled to look exactly as your original string id, but under the labels you have a nice numeric identifier that runs from 1 to the maximum value, which is convenient for all purposes.

              Comment


              • #8
                Thanks so much Joro for wonderful answer with a great illustration!!

                Comment

                Working...
                X