Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems with destring CIK numbers

    Hi guys,

    In a dataset, I have rows with fyear (fiscal year) and CIK numbers, which are company identifiers from Compustat.
    See the code below.

    The problem is that the CIK numbers are defined as string (str7). I need to change CIK to a numeric variable in order to merge this dataset with my other dataset, where the CIK variable is a numeric variable. I used gen nummericCIK = real(CIK) but what STATA then does is remove all the 0's in the CIK number. Where it is good that STATA removes the "first" 0's in the number, because my CIK numbers do not start with 0, it is wrong that STATA removes the 0's in the rest of the number.

    For example, the first CIK number is 0912057. STATA should remove the first 0 here, but not the second zero.

    I tried "replace CIK = subinstr(CIK, "0", 1)" and this works for removing the 0's; however, if I want to destring the variable then, STATA keeps giving me the error that there the variable contains nonnumeric characters.

    Anyone who knows what to do?

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int fyear str7 CIK
    1993 "0912057"
    1993 "0032377"
    1993 "0950131"
    1993 "0353944"
    1993 "0038777"
    1993 "0912057"
    1993 "0912057"
    1993 "0868016"
    1993 "0950124"
    1993 "0950123"
    1993 "0950152"
    1993 "0060302"
    1993 "0051296"
    1993 "0950131"
    1993 "0891618"
    1993 "0808450"
    1993 "0096935"
    1993 "0889810"
    1993 "0912057"
    1993 "0950131"
    1993 "0912057"
    1993 "0034501"
    1993 "0898430"
    Last edited by Pepijn Peters; 14 Oct 2019, 03:08.

  • #2
    In addition:

    Just "destringing" the variable does not work; STATA gives the error "CIK contains nonnumeric characters; no replace"

    Comment


    • #3
      It's not true that real() removes all the zeros; rather, it just ignores leading zeros.

      destring works fine on your data example and produces the same result as real().

      If destring doesn't work on your entire dataset, we need to see problematic values. You can find them by


      Code:
      list CIK if missing(real(CIK))
      or

      Code:
      tab CIK if missing(real(CIK))
      With your data example (thanks!) I get this as grounds for the above.



      Code:
      . gen CIKn = real(CIK)
      
      .
      . destring CIK, gen(CIKd)
      CIK: all characters numeric; CIKd generated as long
      
      .
      . list
      
           +-----------------------------------+
           | fyear       CIK     CIKd     CIKn |
           |-----------------------------------|
        1. |  1993   0912057   912057   912057 |
        2. |  1993   0032377    32377    32377 |
        3. |  1993   0950131   950131   950131 |
        4. |  1993   0353944   353944   353944 |
        5. |  1993   0038777    38777    38777 |
           |-----------------------------------|
        6. |  1993   0912057   912057   912057 |
        7. |  1993   0912057   912057   912057 |
        8. |  1993   0868016   868016   868016 |
        9. |  1993   0950124   950124   950124 |
       10. |  1993   0950123   950123   950123 |
           |-----------------------------------|
       11. |  1993   0950152   950152   950152 |
       12. |  1993   0060302    60302    60302 |
       13. |  1993   0051296    51296    51296 |
       14. |  1993   0950131   950131   950131 |
       15. |  1993   0891618   891618   891618 |
           |-----------------------------------|
       16. |  1993   0808450   808450   808450 |
       17. |  1993   0096935    96935    96935 |
       18. |  1993   0889810   889810   889810 |
       19. |  1993   0912057   912057   912057 |
       20. |  1993   0950131   950131   950131 |
           |-----------------------------------|
       21. |  1993   0912057   912057   912057 |
       22. |  1993   0034501    34501    34501 |
       23. |  1993   0898430   898430   898430 |
           +-----------------------------------+
      
      . assert CIKn == CIKd
      If you typed

      Code:
      destring CIK, ignore(0)
      well, your punishment is that you got what you asked for.

      Comment

      Working...
      X