Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing the last three digits of a four-digit numeric variable

    Hi Statalist,

    I am working with a four-digit integer variable, ISCO08_1. I would like to simply drop the last three digits, and create a one-digit integer variable "ISCO". So far I have tried versions of egen with the cut function, but am generating nothing but missing values. My data and code are below. I am grateful for your advice.

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int ISCO08_1
    3343
    3115
    3343
    3132
    5141
    5322
    3251
    2341
    7511
    5230
    4222
    7127
    2636
    9321
    9111
    9622
    9214
    7223
    3343
    1211
    3411
    5311
    4221
    3434
    9111
    5411
    2341
    5153
    5153
    7411
    3343
    5153
    5230
    2636
    9411
    9111
    4321
    3251
    8171
    5222
    2423
    2529
    8152
    5322
    9411
    3132
    3132
    2641
    5230
    5222
    4222
    2221
    3512
    2212
    2353
    1112
    5141
    9412
    2163
    2412
    2142
    7543
    5322
    8183
    5131
    7126
    5169
    2611
    1311
    9214
    2212
    5312
    2263
    2164
    2514
    3322
    3339
    3313
    1211
    9412
    7322
    7422
    3221
    3313
    5169
    3221
    3115
    1330
    2641
    2636
    2221
    8183
    4321
    2221
    9214
    2341
    9112
    2622
    2330
    7212
    end
    label values ISCO08_1 ISCO08_1
    [/CODE]

    Code:
    egen ISCO = cut(ISCO08_1), at(1)
    (1842103 missing values generated)
    
    . 
    end of do-file
    
    . codebook ISCO
    
    ----------------------------------------------------------------------------------------------------
    ISCO                                                                                     (unlabeled)
    ----------------------------------------------------------------------------------------------------
    
                      type:  numeric (float)
    
                     range:  [.,.]                        units:  .
             unique values:  0                        missing .:  1,842,103/1,842,103
    
                tabulation:  Freq.  Value
                         1,842,103  .

  • #2
    Rosa:
    Make a copy of your variable: gen bibi=ISC008_1 and then recode this newvar knowing
    that your minimum value is 1112 (I prefer to begin with 0001 in any case)
    recode bibi (1112/1999=1) (2000/2999=2)(3000/3999=3) (4000/4999=4)(5000/5999=5) ///
    (6000/6999=6)(7000/7999=7) (8000/8999=8)(9000/9999=9)
    Confirm with tab bibi
    NB: No obs with code 6?
    Hope this is practical !

    Comment


    • #3
      Code:
       gen ISCO= real(substr(string(ISCO08_1), 1, 1))

      Comment


      • #4

        Code:
        gen ISCO = floor(ISCO08_1/1000)

        Comment


        • #5
          Thanks for your advice Louis, that was actually what I ended up doing. But I felt like there needed to be a more elegant solution.
          Both Andrew & Nick's suggestions worked, thanks a lot for those!

          Comment


          • #6
            Hi, now I am trying to remove the last four digits from an eight digit numeric variable, "SIC_codes". Dataex below:

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double SIC_codes
                  -3
                  -3
                  -3
                  -3
                  -3
                  -3
                  -3
                  -3
                  -3
                  -3
                  -3
                  -3
                  -3
            54110201
            83619905
            34950000
            35410200
            26790401
            50230000
            30890308
            87480202
            27520101
            51930200
            36120000
            50510000
            73730000
            27520101
            57120000
            80820000
            50850000
            80690000
            55310100
            83220503
            81119901
            87410000
            60210000
            27310000
            28420102
            20539904
            83610000
            83999909
            30699905
            64110000
            26790301
            48320000
            60220000
            80620000
            49110000
            70110000
            50830300
            81119902
            86210000
            16110000
            80520000
            82210102
            83220300
            28139909
            34929901
            50510217
            81119902
            81119901
            41110101
            20239905
            36690100
            30860000
            84120101
            80620000
            30890608
            82210102
            87420000
            63319908
            35350000
            34410000
            81110200
            30890600
            87429902
            65529901
            80829902
            73891300
            50840000
            17310000
            80620000
            83220000
            70110000
            80110105
            87110000
            86110000
            80620000
            87310302
            35359901
            63310200
            60620000
            80110101
            87310200
            73810000
            22829904
            80990102
            55119901
            80110505
            15410000
            end
            Clearly I have not understood the underlying logic of the real and floor gen options mentioned above, as I can't get them to work in this case (they both produce results other than simply removing the last four digits of the eight digit variable).

            Any advice on how to overcome this roadblock are much appreciated!

            Best,
            Rosa

            Comment


            • #7
              OK, so

              Code:
               gen sic4 = floor(SIC_codes/10000)
              DID work. Not sure what went wrong the previous times. Thanks again Nick.

              Comment

              Working...
              X