Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Including a leading zero to Brazilian zip codes

    Hi all. I have a dataset with many 993,664 Brazilian zip codes (called CEP) that I need to use.

    CEPs are eight digits-long. Several CEPs have one leading zero. However, the dataset contains some CEPs without the leading zero.

    Thus, I need to include the leading zero to these CEPs. I’m having trouble doing it.

    What I tried to do:

    Code:
    format cep_string %08.0f
    // it didn’t work - error r(120)

    Code:
    gen cep_1 = substr(8 * "0", 1, 8 - length(cep_string)) + cep_string
    
    gen cep_2 = string(real(cep_string), "%08.0f")
    // I tried to follow what Nick Cox did in another post on this issue. The string values were rounded up.

    Important: only CEPs with less than eight digits must receive a leading zero. CEPs that already have eight digits must remain as they are.

    Can someone help?

    Thank you.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str74 ds_endereco_locvt_original long cep str8 cep_string
    "AVENIDA EULÁLIO MODESTO, N 2408"                                      68960000 "6.90e+07"
    "RUA CARLOS DE LIMA 1729 - 413-2538"                                    76929000 "7.69e+07"
    "CONJUNTO HABITACIONAL FRANCISCO WILKER CAMPELO S/N"                    62970000 "6.30e+07"
    "AV. OZANAN LEVINDO COELHO, 590"                                        37280000 "3.73e+07"
    "RUA SÃO PAULO NÚMERO 3435"                                           76974000 "7.70e+07"
    "RUA POTIGUARA, 340"                                                    16100000 "1.61e+07"
    "RUA ABATI, 10"                                                         30525260 "3.05e+07"
    "AVENIDA SYCABA, 1200"                                                  32611292 "3.26e+07"
    "AV. FREI BENJAMIM, S/N - ZONA URBANA"                                  45025101 "4.50e+07"
    "RUA WILSON LAGO, 392"                                                  25540020 "2.55e+07"
    "RUA DES PONTES VIEIRA, 575"                                            62680000 "6.27e+07"
    "RUA CARLOS PINTO FILHO, S/N"                                           12630000 "1.26e+07"
    "RUA SAN GENARO, 320"                                                    8584584 "8584584" 
    "VILA SAQUINHO"                                                         63270000 "6.33e+07"
    "AVENIDA TAPAJÓS S/N"                                                  68379500 "6.84e+07"
    "SEPN 707/907"                                                          70790075 "7.08e+07"
    "CONSULADO GERAL DO BRASIL EM MIAMI"                                    11111111 "1.11e+07"
    "AV MANOEL NOVAES, S/N - ZONA RURAL"                                    44920000 "4.49e+07"
    "VILA DO POÇO GRANDE"                                                  63580000 "6.36e+07"
    "RUA XEXEU, 72 - ZONA URBANA"                                           86702320 "8.67e+07"
    "RUA PADRE JOÃO REITZ, N 435"                                          88960000 "8.90e+07"
    "QUINTA AVENIDA, 550 - ZONA URBANA"                                     41746900 "4.17e+07"
    "RUA PARIQUIS  SN"                                                      66030690 "6.60e+07"
    "AV. PRESIDENTE KENNEDY, 635"                                           24445000 "2.44e+07"
    "RUA MONTE ALEGRE, 984"                                                  5014001 "5014001" 
    "RUA SANTA HELENA, S/N,  PARQUE SÃO PEDRO, TARUMA"                     69021015 "6.90e+07"
    "ESTRADA MUNICIPAL"                                                     18310000 "1.83e+07"
    "BAIRRO DOS PAES, S/N"                                                  18310000 "1.83e+07"
    "RUA CUIABÁ, 401"                                                      99345000 "9.93e+07"
    "RUA HUMBERTO DE CAMPOS, S/N - ZONA URBANA"                             45638000 "4.56e+07"
    "RUA APARICIO MARIENSE, 1777"                                           97670000 "9.77e+07"
    "AV. NILTON PENA BOTELHO 458"                                           27175000 "2.72e+07"
    "RUA QUATORZE DE JULHO, S/N"                                            26086425 "2.61e+07"
    "SÍTIO JUCA  PE-130, KM 08"                                            55790000 "5.58e+07"
    "AV MASCARENHAS DE MORAIS S/N"                                          57480000 "5.75e+07"
    "RUA MADRE MONICA MARIA, N. 661 CEP 87.040-440"                         87040440 "8.70e+07"
    "RUA CARLOS CHAGAS, 296"                                                39880000 "3.99e+07"
    "RUA LAGOA SECA, 67"                                                     3462100 "3462100" 
    "POVOADO BOA SORTE - ZONA RURAL"                                        48580000 "4.86e+07"
    "RUA PROFESSOR PEDRO VIRIATO PARIGOT DE SOUZA, 5300"                    81280330 "8.13e+07"
    "PIQ( PROGRAMA DE INTEGRAÇÃO DE QUADRAS) Q. 06 LT. 02, SETOR VEREDAS" 72726125 "7.27e+07"
    "RUA BEZERRA DA PALMA ,SN               "                               50770690 "5.08e+07"
    "RUA RAFAEL ANUNCIATO, 255"                                              8500000 "8500000" 
    "RUA HUM, 188"                                                          35480000 "3.55e+07"
    "AV. DO CONTORNO, S/N"                                                  65080805 "6.51e+07"
    "RUA JOÃO FRANCISCO LEAL, 142"                                         13223091 "1.32e+07"
    "AVENIDA MAZAGÃO, N. 105"                                              68920000 "6.89e+07"
    "AV DUQUE DE CAXIAS"                                                    79240000 "7.92e+07"
    "RUA ANTONIO SAVI, 379"                                                 88940000 "8.89e+07"
    "AV MAGALHÃES BARATA , S/N"                                            68610000 "6.86e+07"
    "RUA DOS IMIGRANTES, N. 184"                                            88955000 "8.90e+07"
    "RUA SAO JOSE S/N"                                                      68524000 "6.85e+07"
    "RUA ACÁCIA DOS SANTOS PEREIRA, S/N"                                   11534730 "1.15e+07"
    "PRAÇA DOM JOSÉ THOMAZ, S/N"                                          49075400 "4.91e+07"
    "RUA JOSÉ SARTURI, 01"                                                 99490000 "9.95e+07"
    "PRAÇA ADALBERTO RIBEIRO, S/N"                                         65130000 "6.51e+07"
    "R MANDAGUAÇU, 100"                                                    87070220 "8.71e+07"
    "AV. CARLOS LIVIERO, 600"                                                4186100 "4186100" 
    "R. TIRADENTES, 847"                                                    79904646 "7.99e+07"
    "RUA ANTONIO PEREIRA,1495"                                              60760525 "6.08e+07"
    "RUA PORTO DO BEZERRA,  25"                                              8440000 "8440000" 
    "ESTRADA GERAL, S/N"                                                    88590000 "8.86e+07"
    "LOCALIDADE LAGOA SECA"                                                 64235000 "6.42e+07"
    "RUA PADRE AGOBAR VALENÇA, S/N (AO LADO DO CAIC)"                      55297400 "5.53e+07"
    "RUA MAL.  DEODORO DA FONSECA, 908"                                     86410000 "8.64e+07"
    "RUA BOTUPORÃ, QUADRA 2, CHÁCARA PERSEVERANÇA, S/N - ZONA URBANA"    41100060 "4.11e+07"
    "R. PRINCIPAL, S/N"                                                     29330000 "2.93e+07"
    "RUA PEDRO MARCHI, 100"                                                 13295000 "1.33e+07"
    "RUA ITAPETI, 500"                                                       8693210 "8693210" 
    "AVENIDA MAURO RAMOS, N. 275"                                           88020301 "8.80e+07"
    end

  • #2
    Will this do what you ask for :

    Code:
    ​​​​​​​gen str8 code = string(cep,"%08.0f")

    Comment


    • #3
      Your code gives us cep_string as data without saying how it was produced.

      But

      1. cep_string is a string variable. So assigning it a numeric display format doesn't make sense to Stata, Any way, changing display format never changes what is stored.

      2. Worse, cep_string is mangled beyond repair. No magic, white or otherwise, will retrieve the digits that were lost when it was created.

      Otherwise the solution is to work with cep.

      "what Nick Cox did in another post on this issue" is unfortunately not enough detail to remind me when and which that was. But this may help:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long cep
       3462100
       4186100
       5014001
       8440000
       8500000
       8584584
       8693210
      11111111
      11534730
      12630000
      13223091
      13295000
      16100000
      18310000
      18310000
      24445000
      25540020
      26086425
      27175000
      29330000
      30525260
      32611292
      35480000
      37280000
      39880000
      41100060
      41746900
      44920000
      45025101
      45638000
      48580000
      49075400
      50770690
      55297400
      55790000
      57480000
      60760525
      62680000
      62970000
      63270000
      63580000
      64235000
      65080805
      65130000
      66030690
      68379500
      68524000
      68610000
      68920000
      68960000
      69021015
      70790075
      72726125
      76929000
      76974000
      79240000
      79904646
      81280330
      86410000
      86702320
      87040440
      87070220
      88020301
      88590000
      88940000
      88955000
      88960000
      97670000
      99345000
      99490000
      end


      .
      .
      Code:
      clonevar cep2 = cep
      
      . format cep2 %08.0f
      
      .
      . gen cep3 = cond(cep < 1e7, "0", "") + strofreal(cep, "%8.0f")
      
      .
      . list
      
      +--------------------------------+
      | cep cep2 cep3 |
      |--------------------------------|
      1. | 3462100 03462100 03462100 |
      2. | 4186100 04186100 04186100 |
      3. | 5014001 05014001 05014001 |
      4. | 8440000 08440000 08440000 |
      5. | 8500000 08500000 08500000 |
      |--------------------------------|
      6. | 8584584 08584584 08584584 |
      7. | 8693210 08693210 08693210 |
      8. | 11111111 11111111 11111111 |
      9. | 11534730 11534730 11534730 |
      10. | 12630000 12630000 12630000 |
      |--------------------------------|
      11. | 13223091 13223091 13223091 |
      12. | 13295000 13295000 13295000 |
      13. | 16100000 16100000 16100000 |
      14. | 18310000 18310000 18310000 |
      15. | 18310000 18310000 18310000 |
      |--------------------------------|
      16. | 24445000 24445000 24445000 |
      17. | 25540020 25540020 25540020 |
      18. | 26086425 26086425 26086425 |
      19. | 27175000 27175000 27175000 |
      20. | 29330000 29330000 29330000 |
      |--------------------------------|
      21. | 30525260 30525260 30525260 |
      22. | 32611292 32611292 32611292 |
      23. | 35480000 35480000 35480000 |
      24. | 37280000 37280000 37280000 |
      25. | 39880000 39880000 39880000 |
      |--------------------------------|
      26. | 41100060 41100060 41100060 |
      27. | 41746900 41746900 41746900 |
      28. | 44920000 44920000 44920000 |
      29. | 45025101 45025101 45025101 |
      30. | 45638000 45638000 45638000 |
      |--------------------------------|
      31. | 48580000 48580000 48580000 |
      32. | 49075400 49075400 49075400 |
      33. | 50770690 50770690 50770690 |
      34. | 55297400 55297400 55297400 |
      35. | 55790000 55790000 55790000 |
      |--------------------------------|
      36. | 57480000 57480000 57480000 |
      37. | 60760525 60760525 60760525 |
      38. | 62680000 62680000 62680000 |
      39. | 62970000 62970000 62970000 |
      40. | 63270000 63270000 63270000 |
      |--------------------------------|
      41. | 63580000 63580000 63580000 |
      42. | 64235000 64235000 64235000 |
      43. | 65080805 65080805 65080805 |
      44. | 65130000 65130000 65130000 |
      45. | 66030690 66030690 66030690 |
      |--------------------------------|
      46. | 68379500 68379500 68379500 |
      47. | 68524000 68524000 68524000 |
      48. | 68610000 68610000 68610000 |
      49. | 68920000 68920000 68920000 |
      50. | 68960000 68960000 68960000 |
      |--------------------------------|
      51. | 69021015 69021015 69021015 |
      52. | 70790075 70790075 70790075 |
      53. | 72726125 72726125 72726125 |
      54. | 76929000 76929000 76929000 |
      55. | 76974000 76974000 76974000 |
      |--------------------------------|
      56. | 79240000 79240000 79240000 |
      57. | 79904646 79904646 79904646 |
      58. | 81280330 81280330 81280330 |
      59. | 86410000 86410000 86410000 |
      60. | 86702320 86702320 86702320 |
      |--------------------------------|
      61. | 87040440 87040440 87040440 |
      62. | 87070220 87070220 87070220 |
      63. | 88020301 88020301 88020301 |
      64. | 88590000 88590000 88590000 |
      65. | 88940000 88940000 88940000 |
      |--------------------------------|
      66. | 88955000 88955000 88955000 |
      67. | 88960000 88960000 88960000 |
      68. | 97670000 97670000 97670000 |
      69. | 99345000 99345000 99345000 |
      70. | 99490000 99490000 99490000 |
      +--------------------------------+

      Comment


      • #4
        Thank you very much Nick. I'm sorry for not including the whole reference. It can be found here: https://www.statalist.org/forums/for...tring-variable

        Comment


        • #5
          Thanks to Frode as well.

          Comment


          • #6
            A quick question, Nick: here at this command line
            Code:
            gen cep3 = cond(cep < 1e7, "0", "") + strofreal(cep, "%8.0f")
            , what is this portion of the code doing? => 1e7

            Comment


            • #7
              1e7 is 10 million. Any 7 digit integer is less than 10 million.

              Comment


              • #8
                Got it. Thanks Nick.

                Comment

                Working...
                X