Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Total of All Occurrences of Value in Another Variable

    Hi,

    this is my first time posting to this forum despite extended use throughout the years. Wanted to start by saying I am super grateful for all the advice here.

    I am trying to figure out the most efficient way to find all occurrences of unique values of one variable across all values of another variable.

    I need this to conduct a network analysis. The dataset is structured on the individual level, where one of the variables indicates the individual and another the manager of that individual. Since there is a time/organizational structure element to my data, individuals can also show up as managers for others. I would like to estimate the impact of having a better-connected manager (as measured by the number of interactions with other individuals in the dataset) on various outcomes, for example, how many individuals are managed by the individual.

    The basic structure, including what I would like to obtain, would look like this.:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1(individual manager) float(individualmanages managermanages)
    "A" "B" 1 2
    "B" "D" 2 1
    "C" "A" 0 1
    "D" "F" 1 1
    "E" "B" 0 2
    end

    Basically, I would like to find the total number of occurrences for each individual in manager. This can be done using a double loop in values, but I have around half a million observations. By my calculations this would take around 50 days to run through on my machine:

    Code:
    gen individualmanages = 0
    gen managermanages = 0
    
    local N = _N
    
    forvalues i = 1(1)`N' {
        forvalues z = 1(1)`N' {
            replace individualmanages = individualmanages + 1 if individual[`i'] == manager[`z'] in `i'
            replace managermanages = managermanages + 1 if manager[`i'] == manager[`z'] in `i'
        }
    ​I was wondering whether someone is aware of a better approach? I couldn't find a solution to this problem on the forum. Thanks a ton.

    I'm using Stata 16.1.

  • #2
    This may start you in a useful direction.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str1(individual manager) float(individualmanages managermanages)
    "A" "B" 1 2
    "B" "D" 2 1
    "C" "A" 0 1
    "D" "F" 1 1
    "E" "B" 0 2
    end
    
    frame put manager, into(m)
    frame m {
       gen imcount = 1
       collapse (sum) imcount, by(manager)
    }
    
    frlink 1:1 individual, frame(m manager) generate(imlink)
    frget im=imcount, from(imlink)
    replace im = 0 if im==.
    drop imlink
    
    frlink m:1 manager, frame(m manager) generate(mmlink)
    frget mm=imcount, from(mmlink)
    drop mmlink
    
    list, clean abbreviate(20)
    Code:
    . list, clean abbreviate(20)
    
           individual   manager   individualmanages   managermanages   im   mm  
      1.            A         B                   1                2    1    2  
      2.            B         D                   2                1    2    1  
      3.            C         A                   0                1    0    1  
      4.            D         F                   1                1    1    1  
      5.            E         B                   0                2    0    2  
    
    .

    Comment


    • #3
      Thank you, William, this is very efficient and not a method I was aware of before. Happy to see frames can be used in such a useful manner. I implemented it almost verbatim and it works for my purposes.

      Comment

      Working...
      X