Hi all,
I've hit a wall during my data analysis and I really need your expert help. I'm going through institutional ownership data and I need to compute indicator variables for the reporting institutions and following that, I will need to aggregate and compute summary statistics per each company per each reporting date. I am very new to Stata and I am really struggling making such computations. I do not really know what commands I should use to carry out the analysis in details hence I am asking you to kindly help me.
I described all the variables that I need to create and calculate, both dummies and continuous ones. I have just joined the blog so I hope what I am asking makes sense and that the info I gave you are sufficient. Possibly I would appreciate a code (or at least a part of it) which I could refine and apply to my dataset.
Thank you in advance to anyone who will help me and I am looking forward to your suggestions and contributions.
Here I present you with an example of the data (I created the data for the example)
file date: is the end of the quarter date at which the manager (institutional investor) has filed their long positions with the SEC. I collected the data starting from the quarter prior to the first date available in my sample, this is because as you can read below I need to compute a variable that indicates the investors' time horizon (1 or 0) and then I shall use this in subsequent computations.
mgrno: it is the unique manager identifier number assigned to each investor, the same investor can hold positions in multiple companies and across different periods of time as you can notice from the example data below.
mgr_type: it is a categorical number which identifies the type of investor. The main purpose of this analysis is to differentiate between Independent Institutions (mgr_type 3, 4 or 5) and Dependent ones (1 or 2). Also, I need to use mgr_type=3 in order to identify mutual funds (you can read more below)
company_id: it is the unique company identifier code used to identify the same company across different time periods.
shares owned: the number of shares owned by the investor in the specific company
total shares outstanding: it is the total number of shares outstanding reported by the company, it should be the same figure for every company across all the investors listed as owning shares in the company as of the file date.
I leave here a link to the original file from which I need to extract the data: https://docs.google.com/spreadsheets...f=true&sd=true
data:image/s3,"s3://crabby-images/d066f/d066fd85951785eca129cb4d47e80e968e3001bc" alt="Click image for larger version
Name: Schermata 2022-04-15 alle 17.01.12.png
Views: 1
Size: 269.6 KB
ID: 1659849"
What I need to do is, in the following order:
1) Compute dummies for identification of investment managers, identified by mgrno.
The resulting dataset should look similar to this: (it is an example from an excel, so do not mind the empty cells)
data:image/s3,"s3://crabby-images/618fb/618fb78becc422cf8253891285368109153139bf" alt="Click image for larger version
Name: Schermata 2022-04-15 alle 19.37.32.png
Views: 1
Size: 234.7 KB
ID: 1659850"
I've hit a wall during my data analysis and I really need your expert help. I'm going through institutional ownership data and I need to compute indicator variables for the reporting institutions and following that, I will need to aggregate and compute summary statistics per each company per each reporting date. I am very new to Stata and I am really struggling making such computations. I do not really know what commands I should use to carry out the analysis in details hence I am asking you to kindly help me.
I described all the variables that I need to create and calculate, both dummies and continuous ones. I have just joined the blog so I hope what I am asking makes sense and that the info I gave you are sufficient. Possibly I would appreciate a code (or at least a part of it) which I could refine and apply to my dataset.
Thank you in advance to anyone who will help me and I am looking forward to your suggestions and contributions.
Here I present you with an example of the data (I created the data for the example)
file date: is the end of the quarter date at which the manager (institutional investor) has filed their long positions with the SEC. I collected the data starting from the quarter prior to the first date available in my sample, this is because as you can read below I need to compute a variable that indicates the investors' time horizon (1 or 0) and then I shall use this in subsequent computations.
mgrno: it is the unique manager identifier number assigned to each investor, the same investor can hold positions in multiple companies and across different periods of time as you can notice from the example data below.
mgr_type: it is a categorical number which identifies the type of investor. The main purpose of this analysis is to differentiate between Independent Institutions (mgr_type 3, 4 or 5) and Dependent ones (1 or 2). Also, I need to use mgr_type=3 in order to identify mutual funds (you can read more below)
company_id: it is the unique company identifier code used to identify the same company across different time periods.
shares owned: the number of shares owned by the investor in the specific company
total shares outstanding: it is the total number of shares outstanding reported by the company, it should be the same figure for every company across all the investors listed as owning shares in the company as of the file date.
I leave here a link to the original file from which I need to extract the data: https://docs.google.com/spreadsheets...f=true&sd=true
What I need to do is, in the following order:
1) Compute dummies for identification of investment managers, identified by mgrno.
- investor_longterm_dummy: = 1 if the "mgrno" (the unique investor code) has held shares (shares owned>0) in the same company (identified by company_id) at the end of previous year (day: 31; month:12; year: year-1). Ex: if observation file date is 30/06/2012, then the dummy should be =1 if the same mgrno reported holding shares in the same company (company_id) as of the date (file date) 31/12/2011. (i.e., dummy=1 --> observation is classified as long-term)
- Independent_manager_dummy: = 1 if mgr_type= 3 or 4 or 5 otherwise if mgr_type=1 or 2 the dummy should be =0.
- Top5_dummy (also for Top 1, Top 10, Top 50): = 1 if the mgrno is listed as a top 5 owner (i.e., the amount of shares owned is among the largest 5 for the company as of the file date) in the company (company_id) as of the file date.
- longterm_Top5 dummy (also for Top 1, Top 10, Top 50): 1= if the same mgrno is classified as top 5 owner and long-term in each company (company_id) for every file date--> hence if investor_long-term_dummy=1 & Top5_dummy=1
- longterm&independent_Top5 dummy (also for Top 1, Top 10, Top 50): 1= if the same mgrno is classified as top 5 owner and long-term and independent, in each company (company_id) for every file date--> hence if investor_long-term_dummy=1 & Top5_dummy=1 & Independent_manager_dummy=1
- compute Block_Holder_dummy: = 1 if the "mgrno" holds an amount of shares (shares owned), as of the file date, in a specific company (company_id) which is >= 5% of total shares outstanding of the company as of the same date, then the observation shall receive a value of 1 (i.e., it is classified as a blockholder). Ex: Shareholder identified by mgrno 9811, owns about 1.2 millions shares in company 00817Y10 as of 31/03/2012 --> since its ownership stake is >=5% it is classified as a block holder.
- Total Ownership by Institutions: it is equal to the sum of the shares owned by all investors (all which are listed) per each company (company_id) and for every period (identified by the file date).
- Top 5 ownership size (continuous variable): it is equal to the sum of the shares owned by the largest 5 investors (ranked by shares owned or indicated by the top5 dummy) in each company (company_id) for every date (file date). I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.
- Ex: assuming investors, identified by mgrno, are ranked by shares owned in company_id as of file date: the Top 5 ownership variable should equal the sum of the shares owned by top 1 investor + top 2 investor +...+top5 investor in the specific company (company_id) as of the specific time/date (file date).
- Top 5 ownership by long-term managers (continuous variable): it is equal to the portion or amount of shares owned by the Top 5 investors in the company (company_id), as of the specific file date, which are indicated as long-term by investor_the long_term dummy=1. I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.
- Top 5 ownership size by mutual funds (continuous variable): it is equal to the portion or amount of shares owned by the Top 5 investors in the company (company_id) which are identified by mgrn_type=3, for each company and for every file date. I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.
- Top 5 ownership size by independent managers (continuous variable): should be equal to the sum of the shares owned by the portion of the top 5 investors (ranked by shares owned) which are indicated as being independent by the use of the independent_manager_dummy=1, for each company (company_id) and for every period (file date). I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.
- Top 5 ownership size by independent long-term managers (continuous variable): should be equal to the sum of the shares owned by the portion of the top 5 investors (ranked by shares owned) which are indicated as being independent and long-term by the use of the independent_manager_dummy=1 & investor_longterm_dummy=1, for each company (company_id) and for every period (file date). I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.
- Top 5 ownership size by short-term, dependent managers (continuous variable): should be equal to the sum of the shares owned by the portion of the top 5 investors (ranked by shares owned) which are indicated as being not independent and short-term by the use of the independent_manager_dummy=0 & investor_longterm_dummy=0, for each company (company_id) and for every period (file date). I need the same statistics for: Top 1 Ownership, Top 10 ownership, Top 50 ownership.
- Block_Holders ownership (continuous variable): should be equal to the sum of the shares owned by those investors who own >= 5% of total shares outstanding in the company (company_id) at each period (file date).
- Same as above:
- Long-term block holders ownership: should be equal to the sum of the shares owned by those investors who own >= 5% of total shares outstanding, and are classified as long-term by the investor_longterm_dummy=1, for each company (each unique company_id) for every date (file date).
- Long-term & Independent block holders ownership: should be equal to the sum of the shares owned by those investors who own >= 5% of total shares outstanding, and are classified as long-term and independent by the investor_longterm_dummy=1 & investor_longterm_dummy=1,for each company (each unique company_id) for every date (file date).
- Also, I need to compute new variables which count the number of all the investors per each company (company_id) for every file date, and count the number of block_holders (i.e., investors who own >= 5% of total shares outstanding) per each company for every file date. The first variable should be called No_13F_filers while the other should be called No_Blocks.
- Lastly, I will need to compute the percent change (% CHANGE) between the the No_13F_Filers at "t" and the No_13F_Filers at "t-1", I need to do the same for the No_Blocks.The percent change in both No_13F_Filers and No_Block_holders shall be computed for each company and for every file date. These variables should be called: CHANGE_all13F and CHANGE_Blocks.
The resulting dataset should look similar to this: (it is an example from an excel, so do not mind the empty cells)
Comment