Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reconstructing chain of linked observations

    I have a dataset in which firms can be owned by people (type 0) or by other firms (type 1). Here is an example:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1(firmid ownerid) str7 last_name byte type
    "A" "B" ""        1
    "A" "C" ""        1
    "A" "D" ""        1
    "B" "E" ""        1
    "B" "F" ""        1
    "E" "I" "Smith"   0
    "E" "J" "Johnson" 0
    "F" "G" ""        1
    "G" "I" "Smith"   0
    "G" "J" "Johnson" 0
    "G" "H" ""        1
    end
    Firms can be owned by other firms which in turn are owned by other firms and so on. For example, A is owned by B, B is owned by E and F, etc. I would like to reconstruct these ownership chains until I get to the ultimate owner(s) (a person, like I or J, or a firm with no owners in the data, like C, D, or H). The final output would look like this:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1(firmid ownerid1 ownerid2 ownerid3) str5 last_name1 str7 last_name2 byte(type1 type2 type3) str1(ultimate_owner1 ultimate_owner2 ultimate_owner3 ultimate_owner4 ultimate_owner5)
    "A" "B" "C" "D" ""      ""        1 1 1 "I" "J" "H" "C" "D"
    "B" "E" "F" ""  ""      ""        1 1 . "I" "J" "H" ""  ""
    "E" "I" "J" ""  "Smith" "Johnson" 0 0 . "I" "J" ""  ""  ""
    "F" "G" ""  ""  ""      ""        1 . . "I" "J" "H" ""  ""
    "G" "I" "J" "H" "Smith" "Johnson" 0 0 1 "I" "J" "H" ""  ""
    end
    I thought vlookup would be a good place to start. With the following code I retrieve all intermediate and final owners (so I'm close to the goal), but the code is cumbersome and I can't see how to generalize it to an arbitrarily long ownership chain.
    Code:
    ssc inst missings
     by firmid, sort: g rep=_n
    reshape wide ownerid last_name share type, i(firmid) j(rep)  
    qui forv j=1/3{    
    g rep_ownerid`j'=ownerid`j'
    forv i=1/3{    
    vlookup rep_ownerid`j', gen(ownerid`j'`i') key(firmid) value(ownerid`i')
    forv k=1/3{    
    vlookup ownerid`j'`i', gen(ownerid`j'`i'`k') key (firmid) value(ownerid`k')
    forv m=1/3{    
    replace ownerid`m'="99" if ownerid`m'==""    
    vlookup ownerid`j'`i'`k', gen(ownerid`j'`i'`k'`m') key (firmid) value(ownerid`m')
    }
    }
    }
    }  
    
    qui foreach x of varlist ownerid* {
    replace `x'="" if `x'=="99"
    }  
    
    qui missings dropvars, force
    drop rep*
    I was wondering if there is a better / more general way to achieve my goal? The original data is rather big (3M observations).
    Also posted on https://stackoverflow.com/questions/...d-observations.
    Last edited by Valentina Rutigliano; 12 Mar 2021, 12:43.

  • #2
    Well, I can get you part way there. To be honest, I do not understand the desired results you show. There are many chains of ownership, that sometimes branch and recombine, and your results do not reflect that. But perhaps from what I show you, you will see how to refine it to your desired result.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str1(firmid ownerid) str7 last_name byte type
    "A" "B" ""        1
    "A" "C" ""        1
    "A" "D" ""        1
    "B" "E" ""        1
    "B" "F" ""        1
    "E" "I" "Smith"   0
    "E" "J" "Johnson" 0
    "F" "G" ""        1
    "G" "I" "Smith"   0
    "G" "J" "Johnson" 0
    "G" "H" ""        1
    end
    
    preserve
    rename firmid link
    keep link ownerid last_name
    tempfile copy
    save `copy'
    
    restore
    keep firmid
    duplicates drop
    
    
    clonevar link = firmid
    joinby link using `copy', unmatched(master)
    drop _merge
    duplicates drop
    rename (ownerid last_name) =_lvl_1
    local lvl = 1
    
    local goon = 1
    while (`goon') {
        replace link = ownerid_lvl_`lvl'
        joinby link using `copy', unmatched(master)
        count if _merge == 3
        if r(N) == 0 {
            local goon 0
            drop ownerid last_name
        }
        else {
            duplicates drop
            local ++lvl
            rename (ownerid last_name) =_lvl_`lvl'
        }
        drop _merge
    }
    By the way, one trap you may fall into: if there are cycles, such as where A is an owner of B, and B is an owner of C, and C is an owner of A, this program will hang in an infinite loop.
    Last edited by Clyde Schechter; 12 Mar 2021, 18:32.

    Comment


    • #3
      Thanks a lot, Clyde. I wasn't familiar with the while and if functions. It took me a while to absorb your solution but it definitely solves the main difficulty which is not knowing in advance how many "levels" of ownership there are.
      I am aware of the fact that cross-ownership would generate an infinite loop, as you pointed out. But I'm not sure there is a solution to that problem.

      Comment

      Working...
      X