Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Classification of data by year and industry

    Dear all,

    I am currently conducting research on the performance of IPOs vs. 'normal' firms. To do so, I have to classify the normal firms into 9 categories, based on their total assets (category 1-3) & their margins (category 1-3). The problem that I am now running into is that these categories have to be constructed per year & per industry. (For example: I need a category 1 for 2008 industry A; as well as a category 1 for 2008 for industry B; etc..)
    My final goal would then be to create a new variable, say assetcat, which gives each firm a number from 1-3 based on what category they belong to, ​given their proper year & industry. Once I have this new variable, I can create the same variable for the margin and then I can use the group function to combine them into one variable.

    I have tried to cut the firms into categories based on assets using egen cut, however this does not take into account year nor industry.
    Now I was trying to construct a simple loop that basically assigns a code from 1-3 based on:

    gen assetcat = . levels stockcodenonipo, local(stockcodes)
    foreach k of local stockcodes {
    forvalues x=2008/2011{
    centile totalassetsnonipo, centile(33 66)
    replace assetcat=1 if totalassetsnonipo<=r(c_1) & stockcodenonipo==`k' & accountperiodsimplifiednoni==`x'
    replace assetcat=2 if totalassetsnonipo>r(c_1) & totalassetsnonipo<=r(c_2)& stockcodenonipo==`k' & accountperiodsimplifiednoni==`x'
    replace assetcat=3 if totalassetsnonipo>r(c_2) stockcodenonipo==`k' & accountperiodsimplifiednoni==`x'
    }
    }

    My 2 problems with this loop are:
    1) there is a mistake somewhere because my assetcat still ends up with all '.' instead of replacing the values with 1,2 or 3
    2) this still does not calculate centiles for each year and industry

    Basically, the fact that these 9 categories have to be re-generated for each year & every industry are really confusing me, and I was wondering whether anyone knew of an efficient solution?

    Thank you so much in advance for any help!

    Best regards,
    Janthe

  • #2
    See if this brings you any closer:

    Code:
    version 12
    clear all
    
    sysuse nlsw88
    
    generate cat=.
    levels occup, local(levs)
    forval r=1/3 {
        foreach lev in `levs' {
          local condition `"occup==`lev' & race==`r'"'
          centile wage, centile(33 66), if `condition'
          replace cat=1 if wage<=r(c_1) & `condition'
          replace cat=2 if wage<=r(c_2) & wage>r(c_1) & `condition'
          replace cat=3 if wage>r(c_2) & `condition'
          sum wage if occup==`lev'
        }
    }
    Best, Sergiy

    Comment


    • #3
      Your main problem is that the effect of

      Code:
      centile totalassetsnonipo, centile(33 66)
      is exactly the same, regardless of your two loops. As I understand it, that is not what you want: you want a classification given conditions. If it is what you want, the centile calculation should be moved outside your loops.

      Consider the following simplified code.

      Code:
      egen group = group(stockcodenonipo accountperiodsimplifiednoni) 
      gen assetcat = 1 if group < . 
      
      local y totalassetsnonipo
      
      su group, meanonly 
      quietly forval i = 1/`r(max)' {
              centile `y' if group == `i', centile(33 66)
              replace assetcat = 2 if group == `i' & `y' > r(c_1) 
              replace assetcat = 3 if group == `i' & `y' > r(c_2) 
      }
      The key points are
      1. Let Stata do more of the work in looping by constructing a variable indexing the combinations of conditions.
      2. We can safely initialise the category to 1, provided that we correct it to 2 or 3 if needed.
      3. Your long variable names are no doubt needed for some purposes, but not all. You need readable code to be less confused.
      Extra notes
      • Please use full real name in posting. I am confident that "kuleuvenusers" is not your real name. See the Advice under FAQ for more.
      • Use the Code markup for variability. Toggle to Advanced editor and find the # button.
      • Your first two lines of code are run together.
      • An & is missing in your last replace statement.

      Comment


      • #4
        Thank you both, the last code worked perfectly for me, so that is what I used! I also apologize for not using my real name before, I only read the FAQs after making my account. Therefore I made myself a new account with my real name.

        Best regards,
        Janthe

        Comment


        • #5
          Thanks for the closure here.

          Comment

          Working...
          X