identifying indirect relationships

Paula de Souza Leao Spinola

Join Date: Jun 2015
Posts: 384

#16

05 Feb 2019, 01:51

Many thanks Mike,
In the code you proposed, the variable "batch" has the same value for the whole dataset, which in this case is 7. Hence, all obs are deleted in the loop when we keep observations from the dataset where batch = 1, ..., 6, 8, 9, 10.

Code:

. gen byte batch = 1 + mod(_N,10)

. tab batch

      batch |      Freq.     Percent        Cum.
------------+-----------------------------------
          7 | 12,033,366      100.00      100.00
------------+-----------------------------------
      Total | 12,033,366      100.00

. display _N
12033366

. preserve

. 
. forval i = 1/10 {
  2. 
.   keep if (batch == `i')
  3. 
.   joinby alter_id using colleagues.dta
  4. 
.   save batch`i'
  5. 
.   restore
  6. 
. }
(12,033,366 observations deleted)

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2404
#17

05 Feb 2019, 09:44

Sorry for that typographical error, should be _n, the observation number, not _N, total sample size.

Code:

gen byte batch = 1 + mod(_n,10)

Any method of creating a variable that divides the dataset into smaller groups is ok here. Another option is:

Code:

gen int temp = _n egen batch = cut(temp), group(10)
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#18

05 Feb 2019, 21:39

With "current" meaning "within given period only", the puzzle becomes much more simple. As regard "different", for me, the trying code of Paula in #14 implies the focus as identifying (indirect) relation itself rather than (indirect) person. Then, if a physician has 2 or more indirect relations with a person, wherein some had happened in the (same) current hospital, the code would not drop this (indirect) person, but just drop the (un-targeted) relation(s). A little more transparency, therefore, might still be needed here. But for coding, this issue (if any) would not be a tricky one.

Below code, utilizing -rangejoint-, a wonderful package by Robert Picard, is expected to work for large dataset.

Code:

clear input float period str1 phy_id str2 hosp 1 "A" "UU" 1 "G" "UU" 1 "B" "VV" 1 "C" "VV" 1 "D" "WW" 1 "F" "WW" 1 "G" "WW" 1 "D" "XX" 1 "E" "XX" 2 "A" "YY" 2 "B" "YY" 2 "D" "ZZ" 2 "F" "ZZ" 3 "A" "ZZ" 3 "D" "ZZ" end tempfile original output save `original', replace ren phy_id d_phy_id joinby using `original' drop if phy_id == d_phy_id save `output', replace rangejoin period . -1 using `output', by(d_phy_id) p(i_) drop if i_hosp == hosp | phy_id == i_phy_id

Last edited by Romalpa Akzo; 05 Feb 2019, 21:45.
Comment

Announcement

Comment

Comment

Comment