I have a dataset in which firms can be owned by people (type 0) or by other firms (type 1). Here is an example:
Firms can be owned by other firms which in turn are owned by other firms and so on. For example, A is owned by B, B is owned by E and F, etc. I would like to reconstruct these ownership chains until I get to the ultimate owner(s) (a person, like I or J, or a firm with no owners in the data, like C, D, or H). The final output would look like this:
I thought vlookup would be a good place to start. With the following code I retrieve all intermediate and final owners (so I'm close to the goal), but the code is cumbersome and I can't see how to generalize it to an arbitrarily long ownership chain.
I was wondering if there is a better / more general way to achieve my goal? The original data is rather big (3M observations).
Also posted on https://stackoverflow.com/questions/...d-observations.
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str1(firmid ownerid) str7 last_name byte type "A" "B" "" 1 "A" "C" "" 1 "A" "D" "" 1 "B" "E" "" 1 "B" "F" "" 1 "E" "I" "Smith" 0 "E" "J" "Johnson" 0 "F" "G" "" 1 "G" "I" "Smith" 0 "G" "J" "Johnson" 0 "G" "H" "" 1 end
Code:
* Example generated by -dataex-. To install: ssc install dataex clear input str1(firmid ownerid1 ownerid2 ownerid3) str5 last_name1 str7 last_name2 byte(type1 type2 type3) str1(ultimate_owner1 ultimate_owner2 ultimate_owner3 ultimate_owner4 ultimate_owner5) "A" "B" "C" "D" "" "" 1 1 1 "I" "J" "H" "C" "D" "B" "E" "F" "" "" "" 1 1 . "I" "J" "H" "" "" "E" "I" "J" "" "Smith" "Johnson" 0 0 . "I" "J" "" "" "" "F" "G" "" "" "" "" 1 . . "I" "J" "H" "" "" "G" "I" "J" "H" "Smith" "Johnson" 0 0 1 "I" "J" "H" "" "" end
Code:
ssc inst missings by firmid, sort: g rep=_n reshape wide ownerid last_name share type, i(firmid) j(rep) qui forv j=1/3{ g rep_ownerid`j'=ownerid`j' forv i=1/3{ vlookup rep_ownerid`j', gen(ownerid`j'`i') key(firmid) value(ownerid`i') forv k=1/3{ vlookup ownerid`j'`i', gen(ownerid`j'`i'`k') key (firmid) value(ownerid`k') forv m=1/3{ replace ownerid`m'="99" if ownerid`m'=="" vlookup ownerid`j'`i'`k', gen(ownerid`j'`i'`k'`m') key (firmid) value(ownerid`m') } } } } qui foreach x of varlist ownerid* { replace `x'="" if `x'=="99" } qui missings dropvars, force drop rep*
Also posted on https://stackoverflow.com/questions/...d-observations.
Comment