Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Distribution of firms by industry (based on SIC codes)

    Hello,
    I wanted to see the distribution of firms in the various industry areas (based on the first 2 digits of SIC codes). Here is what I've written and the error obtained:

    // 1. SIC (2 digits) classification:
    // we create a string variable (sic2) with the first 2 digits of sic in order to indentify the industry
    gen sic2=substr(sic,1,2)
    // we encode the variable so that we obtain a numeric variable instead of a string one
    encode sic2, generate (SIC2)
    // next we generate variable SIC_group to group the various sic into the appropriate industry
    recode SIC2 (01/09=0) (10/14=1) (15/17=2) (20/39=3) (40/49=4) (50/51=5) (52/59=6) (60/67=7) (70/89=8) (91/97=9) (else=.), generate(SIC_group)


    Unfortunately, when I check the variable SIC_groups, I obtain that firms with SIC 7363, so with a SIC2 of 73, are placed in group 6, which is wrong. The error occurs also in other groups and sic.
    Also, do you know which command can I use later to see the number of GVKEY per industry group?

    Has someone any idea what to do?
    Thank you in advance
    Attached Files

  • #2
    This is where you went wrong.
    Code:
    // we encode the variable so that we obtain a numeric variable instead of a string one
    encode sic2, generate (SIC2)
    The encode command is designed for assigning numerical codes to non-numeric strings like "France" "Germany" "United States". The output of help encode instructs us

    Do not use encode if varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.
    Thus your code should be
    Code:
    // we destring the variable so that we obtain a numeric variable instead of a string one
    destring sic2, generate (SIC2)
    or doing it in one step
    Code:
    generate SIC2 = real(substr(sic,1,2))
    Last edited by William Lisowski; 04 Apr 2021, 09:15.

    Comment


    • #3
      encode throws away most of the useful information here and succeeding steps won't fix that easily.

      This may help. It is not the only way to proceed

      Code:
       gen two = substr(SIC, 1, 2)
       
       gen wanted = ///
       cond(inrange(two, "01", "09"), 1, ///
       cond(inrange(two, "10", "14"), 2, ///
       cond(inrange(two, "15", "17"), 3, ///
       cond(inrange(two, "20", "39"), 4, ///
       cond(inrange(two, "40", "49"), 5, ///
       cond(inrange(two, "50", "51"), 6, ///
       cond(inrange(two, "52", "59"), 7, ///
       cond(inrange(two, "60", "67"), 8, ///
       cond(inrange(two, "70", "89"), 9, ///
       cond(inrange(two, "91", "97"), 10, 11))))))))))
       
       
       label def wanted 1 "01/09" 2 "10/14" 3 "15/17" 4 "20/39" 5 "40/49" 6 "50/51" 7 "52/59" 8 "60/67" 9 "70/89" 10 "91/97" 11 "other"
       
       label val wanted wanted

      Comment

      Working...
      X