Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Identify First Two Digits in a variable

    Hello, I am working with a variable that is 7 numbers in length and need to identify the first two as they represent a geographic area. I then want to generate a region variable and place each data record into a category ie. All variables that start with 10 are region A, all variables that start with 24 are region B, etc. Help creating this syntax would be appreciated.

  • #2
    Code:
    gen region=floor(id/100000)
    gets you partway there. Then it would take a messy set of replace commands to get A=10, B=24.

    Comment


    • #3
      You don't tell us whether your variable is a string whose contents happen to be numbers, or is a numeric variable. The approaches are completely different.

      If it is a string variable and if you are certain it is always 7 digits long then it's very easy:

      Code:
      gen first_two_digits = substr(my_variable, 1, 2)
      That creates a new string variable containing the first two digits. If you want to convert that to a number, just run -destring first_two_digits, replace-.

      If your variable is already numeric, and is always 7 digits long and you need the first two digits:
      Code:
      gen first_two_digits = floor(my_variable/100000)
      As for attaching region A, etc. to these, this, too, depends. It depends in large part on how many different two-digit numbers there are and how many regions there are. The above should get you started. To go farther, you need to tell us more.

      Comment


      • #4
        Thank you Ben and Clyde. It is a numeric variable and there are only 9 regions in total.

        Comment


        • #5
          So the next step is just:
          Code:
          gen str code=""
          replace code="A" if region==10
          replace code="B" if region==24

          Comment


          • #6
            So, my next step is to identify the subregions within these codes which are captured in the third and fourth number in the variable. How would I identify the 3rd and 4th digit?

            Comment


            • #7
              From Clyde's answer it follows that you need to use functions here. For a numeric argument, this shows technique:

              Code:
              . di substr(string(1234567), 3, 2)
              34
              
              . di real(substr(string(1234567), 3, 2))
              34
              In your case use the generate command and the name of your variable, not 1234567.

              Comment


              • #8
                Well, since it seems that you have multiple digit extractions to do, you are better off first creating a string version of your variable: it's a lot easier to pull out specific digits from a string then build complicated expressions with -mod()- and -floor()- etc.

                Code:
                tostring my_variable, gen(my_string_variable) format(%07.0f)
                gen third_and_fourth_digits = substr(my_string_variable, 3, 2)
                And if you need that variable with the third_and_fourth_digits to be numeric, then just change the last line to -gen third_and_fourth_digits = real(substr(my_string_variable, 3, 2))-.

                And if you will later need to extract yet other digits, you can do that just as easily with appropriate arguments to the -substr()- function. See -help substr()-.

                Comment


                • #9
                  hhmm, not working for me. The variable I am using is numberic, not a string?

                  Comment


                  • #10
                    Linz: Last week my television wasn't working. Can you tell me why? Naturally, you can't, as that's far too little information.

                    Sorry to be brutal, but in order not to waste your own time, you must tell us exactly what you tried and (if it's not otherwise obvious) exactly why it is not what you want.

                    Please go back and study the FAQ Advice, esp. #12.

                    The people who've answered this thread are very willing to help -- between us we've answered several thousand posts in 2 years here -- but you must ask a question we can answer.

                    FWIW, my own code was geared to numeric input.

                    Comment


                    • #11
                      Thanks Clyde, helpful as always. Nick - feedback noted

                      Comment


                      • #12
                        Linz:

                        Good that you solved your problem, somehow.

                        But I can't see that the last few posts will make sense to anyone else, as in #9 you declared a problem and in #11 it appears that it is solved.

                        For the sake of people interested in this thread, now and in the future, which is why the forum is public, could you spare a few lines to summarize the solution adopted?

                        Comment


                        • #13
                          When I attempted to use the code suggested by Nick above:

                          . di substr(string(1234567), 3, 2)
                          34

                          . di real(substr(string(1234567), 3, 2))
                          34

                          I received a invalid name error. I assumed I had typed the variable name incorrectly, but after two or three additional attempts with the correct variable name I posted the above response to say the code was not working. The invalid name error was what made me think that the problem was a string/numeric problem.

                          When I attempted Clyde's code:

                          tostring my_variable, gen(my_string_variable) format(%07.0f)
                          gen third_and_fourth_digits = substr(my_string_variable, 3, 2)

                          I was able to generate the variable that I had been attempting, namely I was able to generate a variable that identified the 3rd and 4th digit of a 7 character variable, by first converting it from a numeric to a string variable.

                          Hope this is helpful, thanks for your assistance in being a responsible forum participant Nick.

                          Comment


                          • #14
                            Thanks for the closure. My code and results were copied and pasted from a Results window in Stata. You'd need to omit the period prompts. Otherwise, good; your problem is solved.

                            Comment


                            • #15
                              Hi everyone,

                              I am stuck with a similar problem.

                              I want the first and second digits of a -long- type variable which contain each zip code that I am interested in.
                              But, the postal codes have not always the same amount of numbers.

                              For example:
                              1. For postal codes from 0 to - 9000, I want the first digit.
                              2. For postal codes from 10000 up to 55000, I want the first second digits please.
                              Best,

                              Michael

                              Comment

                              Working...
                              X