Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • boardex: data treatment

    Dear all,

    I'm having some issues on importing data from the database Boardex, and I was wondering if you could help me out to prepare the data.
    Boardex is a database with some corporate info for a set of companies and is provided in excel files. Depending on the specific file, there are different types of information being provided but the way it is organized in terms of identifying the firm and director/manager/etc is always the same. And here lies my problem. Here goes a small extract of the data just to show how it is organized:
    Annual Report Date Company Name Company ID Company ISIN Company Ticker Company Index Sector Country DirectorName Director ID Variables of interest 1 Variables of interest 2
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Martin Michael Gagen 1466510920 1 11
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Peter Brian Williams 1466610928 2 12
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Brian Paul Larcombe 400911884 3 13
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Doctor Richard Douglas Summers 60765967 4 14
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Michael James Queen 663810061 5 15
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England John William Melbourn 103147209 6 16
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Baroness (Sarah Mary) Hogg 130078114 7 17
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Doctor John Richard Forrest 143148447 8 18
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England The Rt. Hon. Lord Ralph Thomas Camoys 7662746 9 19
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61, GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Sir George Russell 819911086 10 20
    So, for each year (Annual Report Date), we have the companyID (Boardex specific id), Company ISIN (identifier that can be used to merge with other databases), Director ID/DirectorName, and few Variables of interest that I do not include for copyright issues.

    Given that I need to merge the data with a different database, ISIN is the identifier to be used. But, as you can see, for this company, we have two ISINs in the Company ISIN variable. Other companies have more, others one or even none (these last ones are not important as they will be deleted). Basically i would need to parse the Company ISIN field identifying the (variable) ISINs, and put them in one variable (say ISIN). This would require to repeat the information of all the other variables for each of the new identified and parsed ISINs. Is this possible to be done in Stata? I know that I would need to reshape the data, but the problem is the Company ISIN variable with all those ISINs

    The data would follow this scheme:
    Annual Report Date Company Name Company ID Company ISIN Company Ticker Company Index Sector Country DirectorName Director ID Variables of interest 1 Variables of interest 2
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Martin Michael Gagen 1.47E+09 1 11
    Mar-99 3I GROUP PLC 11523753 GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Martin Michael Gagen 1.47E+09 1 11
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Peter Brian Williams 1.47E+09 2 12
    Mar-99 3I GROUP PLC 11523753 GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Peter Brian Williams 1.47E+09 2 12
    Mar-99 3I GROUP PLC 11523753 GB00B16PRC61 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Brian Paul Larcombe 4.01E+08 3 13
    Mar-99 3I GROUP PLC 11523753 GB00B1YW4409 III, IIIP FTSE 250(GBP) Private Equity United Kingdom - England Brian Paul Larcombe 4.01E+08 3 13


    Best wishes,

    Nuno

  • #2
    As a start, the following will separate out your ISIN field into distinct ISINs (assuming each is 30 characters long or less).
    Code:
    clear
    input str8 companyid str90 isin
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    "11523753" "GB00B16PRC61, GB00B1YW4409"
    end
    generate str90 isinc = subinstr(isin,",","",.)
    generate str30 isin1 = word(isinc,1)
    generate str30 isin2 = word(isinc,2)
    generate str30 isin3 = word(isinc,3)
    drop isin isinc
    list, clean noobs
    Code:
        compan~d          isin1          isin2   isin3  
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409          
        11523753   GB00B16PRC61   GB00B1YW4409

    Comment


    • #3
      Dear William,

      Thank you for your reply! it helped to start the code going and realizing I was going the wrong way! I decided that I would be better to create a master link file instead of trying to do everything at one.

      the working code is something like this:

      Code:
      import excel boardex_file.xlsx, sheet("Summary") firstrow case(lower) clear
      
      keep companyid annualreportdate companyisin 
      generate str isinc = subinstr( companyisin ,",","",.)
      gen isin_count=wordcount(isinc)
      sum isin_count
      
      forvalues i=1/`r(max)'{
          generate str isin`i'= word(isinc,`i')
      }
      
      drop isinc companyisin 
      
      sort companyid annualreportdate
      by companyid annualreportdate: keep if _n==1
      
      reshape long isin, i(companyid annualreportdate) j(n_isin)
      keep if !missing(isin)
      I now have a another issue that relates on saving the different dta files. The original xlsx file has several worksheets that need to be imported, one at the time, for the different year files. I've written a bit of code for this as follows:

      Code:
      cd "D:\boardex"
      
      fs *.xlsx
      
      foreach f in `r(files)'{
          import excel `f', describe
          forvalues j=1 (1) `r(N_worksheet)'{
              di "`r(worksheet_`j')'"
              import excel `f', sheet(`r(worksheet_`j')') firstrow case(lower) clear
              save `f'_`r(worksheet_`j')'.dta, replace
          }
      }
      The problem is saving the dta file. The code seems to run fine for the first worksheets but than stops:

      Code:
      Sheet  Range
      --------------------------------+--------------------------------
      Summary  A1:P3887
      Direct  A1:P3887
      Equity Linked Summary  A1:Q3887
      Equity Linked Option  A1:Y415
      Equity Linked LTIP Cash Plan  A1:O3
      Equity Linked LTIP Option Plan  A1:Y818
      Equity Linked LTIP Share Plan  A1:R517
      LTIP Share Matching Plan  A1:R59
      Accumulated Wealth Summary  A1:R3887
      Accumulated Wealth Options  A1:Z2351
      Acc. Wealth LTIP Cash Plan  A1:O12
      Acc. Wealth LTIP Option Plan  A1:Z2768
      Acc. Wealth LTIP Share Plan  A1:R818
      Acc.Wealth LTIP Share Mth. Plan  A1:R88
      Summary
      (note: file uk1999.xlsx_Summary.dta not found)
      file uk1999.xlsx_Summary.dta saved
      Direct
      (note: file uk1999.xlsx_Direct.dta not found)
      file uk1999.xlsx_Direct.dta saved
      Equity Linked Summary
      invalid 'Linked' 
      r(198);
      
      end of do-file
      
      r(198);
      it seems Stata is having a problem saving Equity Linked Summary. Any idea why?

      Comment


      • #4
        Well, I'm answering myself but the correct code is

        Code:
        fs *.xlsx
        
        foreach f in `r(files)'{
            import excel `f', describe
            forvalues j=1 (1) `r(N_worksheet)'{
                di "`r(worksheet_`j')'"
                import excel `f', sheet(`r(worksheet_`j')') firstrow case(lower) clear
                local fname=subinstr(`"`r(worksheet_`j')'"'," ","",.)
                save `f'_`fname'.dta, replace
            }
        }

        Comment

        Working...
        X