Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Many thanks Mike,
    In the code you proposed, the variable "batch" has the same value for the whole dataset, which in this case is 7. Hence, all obs are deleted in the loop when we keep observations from the dataset where batch = 1, ..., 6, 8, 9, 10.

    Code:
    . gen byte batch = 1 + mod(_N,10)
    
    . tab batch
    
          batch |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              7 | 12,033,366      100.00      100.00
    ------------+-----------------------------------
          Total | 12,033,366      100.00
    
    . display _N
    12033366
    
    . preserve
    
    . 
    . forval i = 1/10 {
      2. 
    .   keep if (batch == `i')
      3. 
    .   joinby alter_id using colleagues.dta
      4. 
    .   save batch`i'
      5. 
    .   restore
      6. 
    . }
    (12,033,366 observations deleted)

    Comment


    • #17
      Sorry for that typographical error, should be _n, the observation number, not _N, total sample size.
      Code:
      gen byte batch = 1 + mod(_n,10)
      Any method of creating a variable that divides the dataset into smaller groups is ok here. Another option is:
      Code:
      gen int temp = _n
      egen batch = cut(temp), group(10)

      Comment


      • #18
        With "current" meaning "within given period only", the puzzle becomes much more simple. As regard "different", for me, the trying code of Paula in #14 implies the focus as identifying (indirect) relation itself rather than (indirect) person. Then, if a physician has 2 or more indirect relations with a person, wherein some had happened in the (same) current hospital, the code would not drop this (indirect) person, but just drop the (un-targeted) relation(s). A little more transparency, therefore, might still be needed here. But for coding, this issue (if any) would not be a tricky one.

        Below code, utilizing -rangejoint-, a wonderful package by Robert Picard, is expected to work for large dataset.
        Code:
        clear
        input float period str1 phy_id str2 hosp
        1 "A" "UU"
        1 "G" "UU"
        1 "B" "VV"
        1 "C" "VV"
        1 "D" "WW"
        1 "F" "WW"
        1 "G" "WW"
        1 "D" "XX"
        1 "E" "XX"
        2 "A" "YY"
        2 "B" "YY"
        2 "D" "ZZ"
        2 "F" "ZZ"
        3 "A" "ZZ"
        3 "D" "ZZ"
        end
        
        tempfile original output
        save `original', replace
        
        ren phy_id d_phy_id
        joinby using `original'
        drop if phy_id == d_phy_id
        save `output', replace
        
        rangejoin period . -1 using `output', by(d_phy_id) p(i_)
        drop if i_hosp == hosp | phy_id == i_phy_id
        Last edited by Romalpa Akzo; 05 Feb 2019, 21:45.

        Comment

        Working...
        X