Hi,
this is my first time posting to this forum despite extended use throughout the years. Wanted to start by saying I am super grateful for all the advice here.
I am trying to figure out the most efficient way to find all occurrences of unique values of one variable across all values of another variable.
I need this to conduct a network analysis. The dataset is structured on the individual level, where one of the variables indicates the individual and another the manager of that individual. Since there is a time/organizational structure element to my data, individuals can also show up as managers for others. I would like to estimate the impact of having a better-connected manager (as measured by the number of interactions with other individuals in the dataset) on various outcomes, for example, how many individuals are managed by the individual.
The basic structure, including what I would like to obtain, would look like this.:
Basically, I would like to find the total number of occurrences for each individual in manager. This can be done using a double loop in values, but I have around half a million observations. By my calculations this would take around 50 days to run through on my machine:
I was wondering whether someone is aware of a better approach? I couldn't find a solution to this problem on the forum. Thanks a ton.
I'm using Stata 16.1.
this is my first time posting to this forum despite extended use throughout the years. Wanted to start by saying I am super grateful for all the advice here.
I am trying to figure out the most efficient way to find all occurrences of unique values of one variable across all values of another variable.
I need this to conduct a network analysis. The dataset is structured on the individual level, where one of the variables indicates the individual and another the manager of that individual. Since there is a time/organizational structure element to my data, individuals can also show up as managers for others. I would like to estimate the impact of having a better-connected manager (as measured by the number of interactions with other individuals in the dataset) on various outcomes, for example, how many individuals are managed by the individual.
The basic structure, including what I would like to obtain, would look like this.:
Code:
* Example generated by -dataex-. For more info, type help dataex clear input str1(individual manager) float(individualmanages managermanages) "A" "B" 1 2 "B" "D" 2 1 "C" "A" 0 1 "D" "F" 1 1 "E" "B" 0 2 end
Basically, I would like to find the total number of occurrences for each individual in manager. This can be done using a double loop in values, but I have around half a million observations. By my calculations this would take around 50 days to run through on my machine:
Code:
gen individualmanages = 0 gen managermanages = 0 local N = _N forvalues i = 1(1)`N' { forvalues z = 1(1)`N' { replace individualmanages = individualmanages + 1 if individual[`i'] == manager[`z'] in `i' replace managermanages = managermanages + 1 if manager[`i'] == manager[`z'] in `i' }
I'm using Stata 16.1.
Comment