Total of All Occurrences of Value in Another Variable

Gailius Praninskas

Join Date: Feb 2020

Posts: 2
#1

Total of All Occurrences of Value in Another Variable

30 Jan 2022, 07:13

Hi,

this is my first time posting to this forum despite extended use throughout the years. Wanted to start by saying I am super grateful for all the advice here.

I am trying to figure out the most efficient way to find all occurrences of unique values of one variable across all values of another variable.

I need this to conduct a network analysis. The dataset is structured on the individual level, where one of the variables indicates the individual and another the manager of that individual. Since there is a time/organizational structure element to my data, individuals can also show up as managers for others. I would like to estimate the impact of having a better-connected manager (as measured by the number of interactions with other individuals in the dataset) on various outcomes, for example, how many individuals are managed by the individual.

The basic structure, including what I would like to obtain, would look like this.:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input str1(individual manager) float(individualmanages managermanages) "A" "B" 1 2 "B" "D" 2 1 "C" "A" 0 1 "D" "F" 1 1 "E" "B" 0 2 end

Basically, I would like to find the total number of occurrences for each individual in manager. This can be done using a double loop in values, but I have around half a million observations. By my calculations this would take around 50 days to run through on my machine:

Code:

gen individualmanages = 0 gen managermanages = 0 local N = _N forvalues i = 1(1)`N' { forvalues z = 1(1)`N' { replace individualmanages = individualmanages + 1 if individual[`i'] == manager[`z'] in `i' replace managermanages = managermanages + 1 if manager[`i'] == manager[`z'] in `i' }

I was wondering whether someone is aware of a better approach? I couldn't find a solution to this problem on the forum. Thanks a ton.

I'm using Stata 16.1.
Tags: None

William Lisowski

Join Date: Dec 2014
Posts: 10150

30 Jan 2022, 09:42

This may start you in a useful direction.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str1(individual manager) float(individualmanages managermanages)
"A" "B" 1 2
"B" "D" 2 1
"C" "A" 0 1
"D" "F" 1 1
"E" "B" 0 2
end

frame put manager, into(m)
frame m {
   gen imcount = 1
   collapse (sum) imcount, by(manager)
}

frlink 1:1 individual, frame(m manager) generate(imlink)
frget im=imcount, from(imlink)
replace im = 0 if im==.
drop imlink

frlink m:1 manager, frame(m manager) generate(mmlink)
frget mm=imcount, from(mmlink)
drop mmlink

list, clean abbreviate(20)

Code:

. list, clean abbreviate(20)

       individual   manager   individualmanages   managermanages   im   mm  
  1.            A         B                   1                2    1    2  
  2.            B         D                   2                1    2    1  
  3.            C         A                   0                1    0    1  
  4.            D         F                   1                1    1    1  
  5.            E         B                   0                2    0    2  

.

Comment

Gailius Praninskas

Join Date: Feb 2020

Posts: 2
#3

30 Jan 2022, 14:28

Thank you, William, this is very efficient and not a method I was aware of before. Happy to see frames can be used in such a useful manner. I implemented it almost verbatim and it works for my purposes.
1 like
Comment

Announcement

Total of All Occurrences of Value in Another Variable

Comment

Comment