Create summary statistics and indicator variables for a dataset of institutional ownership data

Emanuele Crocetta

Join Date: Apr 2022

Posts: 7
#1

Create summary statistics and indicator variables for a dataset of institutional ownership data

15 Apr 2022, 13:38

Hi all,

I've hit a wall during my data analysis and I really need your expert help. I'm going through institutional ownership data and I need to compute indicator variables for the reporting institutions and following that, I will need to aggregate and compute summary statistics per each company per each reporting date. I am very new to Stata and I am really struggling making such computations. I do not really know what commands I should use to carry out the analysis in details hence I am asking you to kindly help me.

I described all the variables that I need to create and calculate, both dummies and continuous ones. I have just joined the blog so I hope what I am asking makes sense and that the info I gave you are sufficient. Possibly I would appreciate a code (or at least a part of it) which I could refine and apply to my dataset.

Thank you in advance to anyone who will help me and I am looking forward to your suggestions and contributions.

Here I present you with an example of the data (I created the data for the example)

file date: is the end of the quarter date at which the manager (institutional investor) has filed their long positions with the SEC. I collected the data starting from the quarter prior to the first date available in my sample, this is because as you can read below I need to compute a variable that indicates the investors' time horizon (1 or 0) and then I shall use this in subsequent computations.
mgrno: it is the unique manager identifier number assigned to each investor, the same investor can hold positions in multiple companies and across different periods of time as you can notice from the example data below.
mgr_type: it is a categorical number which identifies the type of investor. The main purpose of this analysis is to differentiate between Independent Institutions (mgr_type 3, 4 or 5) and Dependent ones (1 or 2). Also, I need to use mgr_type=3 in order to identify mutual funds (you can read more below)
company_id: it is the unique company identifier code used to identify the same company across different time periods.
shares owned: the number of shares owned by the investor in the specific company
total shares outstanding: it is the total number of shares outstanding reported by the company, it should be the same figure for every company across all the investors listed as owning shares in the company as of the file date.

I leave here a link to the original file from which I need to extract the data: https://docs.google.com/spreadsheets...f=true&sd=true

What I need to do is, in the following order:

1) Compute dummies for identification of investment managers, identified by mgrno.
investor_longterm_dummy: = 1 if the "mgrno" (the unique investor code) has held shares (shares owned>0) in the same company (identified by company_id) at the end of previous year (day: 31; month:12; year: year-1). Ex: if observation file date is 30/06/2012, then the dummy should be =1 if the same mgrno reported holding shares in the same company (company_id) as of the date (file date) 31/12/2011. (i.e., dummy=1 --> observation is classified as long-term)

Independent_manager_dummy: = 1 if mgr_type= 3 or 4 or 5 otherwise if mgr_type=1 or 2 the dummy should be =0.

Top5_dummy (also for Top 1, Top 10, Top 50): = 1 if the mgrno is listed as a top 5 owner (i.e., the amount of shares owned is among the largest 5 for the company as of the file date) in the company (company_id) as of the file date.

longterm_Top5 dummy (also for Top 1, Top 10, Top 50): 1= if the same mgrno is classified as top 5 owner and long-term in each company (company_id) for every file date--> hence if investor_long-term_dummy=1 & Top5_dummy=1

longterm&independent_Top5 dummy (also for Top 1, Top 10, Top 50): 1= if the same mgrno is classified as top 5 owner and long-term and independent, in each company (company_id) for every file date--> hence if investor_long-term_dummy=1 & Top5_dummy=1 & Independent_manager_dummy=1

compute Block_Holder_dummy: = 1 if the "mgrno" holds an amount of shares (shares owned), as of the file date, in a specific company (company_id) which is >= 5% of total shares outstanding of the company as of the same date, then the observation shall receive a value of 1 (i.e., it is classified as a blockholder). Ex: Shareholder identified by mgrno 9811, owns about 1.2 millions shares in company 00817Y10 as of 31/03/2012 --> since its ownership stake is >=5% it is classified as a block holder.

2) Compute summary statistics which aggregate shareholdings (shares owned) per each company (company_id), as of each date (file date), by mgr_type and different combinations of the dummies listed above, and by both Top5, Top1, Top10 and Top50 investors (mgrno) in terms of the relative amount of shares owned
Total Ownership by Institutions: it is equal to the sum of the shares owned by all investors (all which are listed) per each company (company_id) and for every period (identified by the file date).

Top 5 ownership size (continuous variable): it is equal to the sum of the shares owned by the largest 5 investors (ranked by shares owned or indicated by the top5 dummy) in each company (company_id) for every date (file date). I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.

Ex: assuming investors, identified by mgrno, are ranked by shares owned in company_id as of file date: the Top 5 ownership variable should equal the sum of the shares owned by top 1 investor + top 2 investor +...+top5 investor in the specific company (company_id) as of the specific time/date (file date).

Top 5 ownership by long-term managers (continuous variable): it is equal to the portion or amount of shares owned by the Top 5 investors in the company (company_id), as of the specific file date, which are indicated as long-term by investor_the long_term dummy=1. I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.

Top 5 ownership size by mutual funds (continuous variable): it is equal to the portion or amount of shares owned by the Top 5 investors in the company (company_id) which are identified by mgrn_type=3, for each company and for every file date. I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.

Top 5 ownership size by independent managers (continuous variable): should be equal to the sum of the shares owned by the portion of the top 5 investors (ranked by shares owned) which are indicated as being independent by the use of the independent_manager_dummy=1, for each company (company_id) and for every period (file date). I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.

Top 5 ownership size by independent long-term managers (continuous variable): should be equal to the sum of the shares owned by the portion of the top 5 investors (ranked by shares owned) which are indicated as being independent and long-term by the use of the independent_manager_dummy=1 & investor_longterm_dummy=1, for each company (company_id) and for every period (file date). I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.

Top 5 ownership size by short-term, dependent managers (continuous variable): should be equal to the sum of the shares owned by the portion of the top 5 investors (ranked by shares owned) which are indicated as being not independent and short-term by the use of the independent_manager_dummy=0 & investor_longterm_dummy=0, for each company (company_id) and for every period (file date). I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.

Block_Holders ownership (continuous variable): should be equal to the sum of the shares owned by those investors who own >= 5% of total shares outstanding in the company (company_id) at each period (file date).

Same as above:

Long-term block holders ownership: should be equal to the sum of the shares owned by those investors who own >= 5% of total shares outstanding, and are classified as long-term by the investor_longterm_dummy=1, for each company (each unique company_id) for every date (file date).

Long-term & Independent block holders ownership: should be equal to the sum of the shares owned by those investors who own >= 5% of total shares outstanding, and are classified as long-term and independent by the investor_longterm_dummy=1 & investor_longterm_dummy=1,for each company (each unique company_id) for every date (file date).

Also, I need to compute new variables which count the number of all the investors per each company (company_id) for every file date, and count the number of block_holders (i.e., investors who own >= 5% of total shares outstanding) per each company for every file date. The first variable should be called No_13F_filers while the other should be called No_Blocks.

Lastly, I will need to compute the percent change (% CHANGE) between the the No_13F_Filers at "t" and the No_13F_Filers at "t-1", I need to do the same for the No_Blocks.The percent change in both No_13F_Filers and No_Block_holders shall be computed for each company and for every file date. These variables should be called: CHANGE_all13F and CHANGE_Blocks.

3) The last thing I need is to keep only the observations which contain the company_id, the file date, the total shares outstanding, all the aggregate holdings (shares owned) by type which have been listed above in step 2, the variables No_13_Filers and No_Blocks and lastly, the variables which include the percent changes, CHANGE_all13F and CHANGE_Blocks.

The resulting dataset should look similar to this: (it is an example from an excel, so do not mind the empty cells)
Tags: data analysis, data management, interaction, panel data, Time Series
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17613
#2

16 Apr 2022, 02:43

Emanuele:
there's whiff of class/home assignement here (please see https://www.statalist.org/forums/help#adviceextras, #4. Thanks).

Kind regards,
Carlo
(StataNow 18.5)
Comment
Emanuele Crocetta

Join Date: Apr 2022

Posts: 7
#3

17 Apr 2022, 20:11

Hi Carlo,

Thanks for the heads up, I shall look into the advices directory first next time!

I was wondering if you could help me figuring out how to keep the last 5 observations for each company (unique_id) for every date (file_date). I am basically working with the same dataset as I posted before but I need to do just this quick thing. Since I have the data arranged in a panel data format, I tried to run the following command:

bysort FileDate Cusip SharesHeldatEndofQtr: keep in -5/-1
'in' may not be combined with 'by' r(190);

BUT IT DOES NOT WORK and I get this error ('in' may not be combined with 'by' r(190). I managed to sort the data in the correct order (first by date, then by company_id and lastly by ownership size) but I am having trouble in dropping the observations which are not in the top 5 (in this case the bottom 5 as sort works in ascending order) because the in range option apparently does not work with the by command. Do you know of any solution?

Thanks in advance for the help!

Last edited by Emanuele Crocetta; 17 Apr 2022, 20:19.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17613
#4

18 Apr 2022, 05:24

Emanuele:
philosophycally speaking, your next time should have been before your last post.
Please note that deciding not to abide by the forum rules is as legal as ignoring the queries of those acting that way. Thanks.
That said, the following toy.example can give you some guidance (or hopefully so):

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta" (National Longitudinal Survey of Young Women, 14-24 years old in 1968) . bysort idcode lnwage: drop if _n>=5

Kind regards,
Carlo
(StataNow 18.5)
1 like
Comment
Emanuele Crocetta

Join Date: Apr 2022

Posts: 7
#5

18 Apr 2022, 11:24

Thank you Carlo, I had overlooked the the forum rules as I was in a hurry but I'll keep that well in mind as promised!
Comment

Announcement

Create summary statistics and indicator variables for a dataset of institutional ownership data

Comment

Comment

Comment

Comment