Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata inlist Command Running Slowly

    Hi All,

    I am trying to match children and parents to each other using their IDs. I have code that works, but it takes very long (a few hours). I currently have about 3.1 million observations, with 100,214 unique IDs. Here is a small example of the data and my code:

    ID = each person's unique ID
    ID_M = ID of a person's mother
    ID_AM = ID of a person's adopted mother
    ID_F = ID of a person's father
    ID_AF = ID of a person's adopted father

    I am creating binary variables "parents_present," "f_children_present," "m_children_present," "af_children_present," "am_children_present" that indicate if

    1) a person has parents present
    2) a father has children present
    3) a mother has children present
    4) an adopted father has children present
    5) an adopted mother has children present.


    Is there are a way to do this more quickly? Any suggestions would be appreciated!

    Best,
    Cora


    g parents_present = 0
    levelsof ID, local(levels)
    foreach l of local levels {
    replace parents_present = 1 if inlist(`l',ID_F,ID_M,ID_AF,ID_AM)
    }

    g f_children_present = 0
    levelsof ID_F, local(levels)
    foreach l of local levels {
    replace f_children_present = 1 if inlist(`l',ID)
    }

    g m_children_present = 0
    levelsof ID_M, local(levels)
    foreach l of local levels {
    replace m_children_present = 1 if inlist(`l',ID)
    }

    g af_children_present = 0
    levelsof ID_AF, local(levels)
    foreach l of local levels {
    replace af_children_present = 1 if inlist(`l',ID)
    }

    g am_children_present = 0
    levelsof ID_AM, local(levels)
    foreach l of local levels {
    replace am_children_present = 1 if inlist(`l',ID)
    }

    ************************************************** ************************************************** *****************
    ************************************************** ************************************************** *****************

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float ID int year float(ID_F ID_M ID_AF ID_AM)
    4003 1968 4001 4002 . .
    4003 1969 4001 4002 . .
    4003 1970 4001 4002 . .
    4003 1971 4001 4002 . .
    4003 1972 4001 4002 . .
    4003 1973 4001 4002 . .
    4003 1974 4001 4002 . .
    4003 1975 4001 4002 . .
    4003 1976 4001 4002 . .
    4003 1977 4001 4002 . .
    4003 1978 4001 4002 . .
    4003 1979 4001 4002 . .
    4003 1980 4001 4002 . .
    4003 1981 4001 4002 . .
    4003 1982 4001 4002 . .
    4003 1983 4001 4002 . .
    4003 1984 4001 4002 . .
    4003 1985 4001 4002 . .
    4003 1986 4001 4002 . .
    4003 1987 4001 4002 . .
    4003 1988 4001 4002 . .
    4003 1989 4001 4002 . .
    4003 1990 4001 4002 . .
    4003 1991 4001 4002 . .
    4003 1992 4001 4002 . .
    4003 1993 4001 4002 . .
    4003 1994 4001 4002 . .
    4003 1995 4001 4002 . .
    4003 1996 4001 4002 . .
    4003 1997 4001 4002 . .
    4003 1999 4001 4002 . .
    4005 1968 4001 4002 . .
    4005 1969 4001 4002 . .
    4005 1970 4001 4002 . .
    4005 1971 4001 4002 . .
    4005 1972 4001 4002 . .
    4005 1973 4001 4002 . .
    4005 1974 4001 4002 . .
    4005 1975 4001 4002 . .
    4005 1976 4001 4002 . .
    4005 1977 4001 4002 . .
    4005 1978 4001 4002 . .
    4005 1979 4001 4002 . .
    4005 1980 4001 4002 . .
    4005 1981 4001 4002 . .
    4005 1982 4001 4002 . .
    4005 1983 4001 4002 . .
    4005 1984 4001 4002 . .
    4005 1985 4001 4002 . .
    4005 1986 4001 4002 . .
    4005 1987 4001 4002 . .
    4005 1988 4001 4002 . .
    4005 1989 4001 4002 . .
    4005 1990 4001 4002 . .
    4005 1991 4001 4002 . .
    4005 1992 4001 4002 . .
    4005 1993 4001 4002 . .
    4005 1994 4001 4002 . .
    4005 1995 4001 4002 . .
    4005 1996 4001 4002 . .
    4005 1997 4001 4002 . .
    4005 1999 4001 4002 . .
    4030 1968    . 4005 . .
    4030 1969    . 4005 . .
    4030 1970    . 4005 . .
    4030 1971    . 4005 . .
    4030 1972    . 4005 . .
    4030 1973    . 4005 . .
    4030 1974    . 4005 . .
    4030 1975    . 4005 . .
    4030 1976    . 4005 . .
    4030 1977    . 4005 . .
    4030 1978    . 4005 . .
    4030 1979    . 4005 . .
    4030 1980    . 4005 . .
    4030 1981    . 4005 . .
    4030 1982    . 4005 . .
    4030 1983    . 4005 . .
    4030 1984    . 4005 . .
    4030 1985    . 4005 . .
    4030 1986    . 4005 . .
    4030 1987    . 4005 . .
    4030 1988    . 4005 . .
    4030 1989    . 4005 . .
    4030 1990    . 4005 . .
    4030 1991    . 4005 . .
    4030 1992    . 4005 . .
    4030 1993    . 4005 . .
    4030 1994    . 4005 . .
    4030 1995    . 4005 . .
    4030 1996    . 4005 . .
    4030 1997    . 4005 . .
    4030 1999    . 4005 . .
    4031 1968 4003 4173 . .
    4031 1969 4003 4173 . .
    4031 1970 4003 4173 . .
    4031 1971 4003 4173 . .
    4031 1972 4003 4173 . .
    4031 1973 4003 4173 . .
    4031 1974 4003 4173 . .
    end

  • #2
    Rather than looping over a list of IDs, creating a dataset or frame of distinct IDs and then merging datasets or linking frames will likely be more efficient.
    Code:
    frame put ID, into(idlist)
    frame idlist {
        duplicates drop
    }
    
    frlink m:1 ID_F, frame(idlist ID) generate(link_F)
    frlink m:1 ID_M, frame(idlist ID) generate(link_M)
    
    generate parents_present = link_F!=. | link_M!=.
    tab ID parents_present
    
    drop link_F link_M
    frame drop idlist
    Code:
    . tab ID parents_present
    
               |    parents_present
            ID |         0          1 |     Total
    -----------+----------------------+----------
          4003 |        31          0 |        31
          4005 |        31          0 |        31
          4030 |         0         31 |        31
          4031 |         0          7 |         7
    -----------+----------------------+----------
         Total |        62         38 |       100
    Code:
    preserve
    keep ID
    rename ID ID_match
    duplicates drop
    tempfile idlist
    save `idlist'
    restore
    
    generate ID_match = ID_F
    merge m:1 ID_match using `idlist', keep(master match) generate(merge_F) 
    replace ID_match = ID_M
    merge m:1 ID_match using `idlist', keep(master match) generate(merge_M)
    drop ID_match
    
    generate parents_present = merge_F==3 | merge_M==3
    tab ID parents_present
    
    drop merge_F merge_M
    Code:
    . tab ID parents_present
    
               |    parents_present
            ID |         0          1 |     Total
    -----------+----------------------+----------
          4003 |        31          0 |        31 
          4005 |        31          0 |        31 
          4030 |         0         31 |        31 
          4031 |         0          7 |         7 
    -----------+----------------------+----------
         Total |        62         38 |       100
    Last edited by William Lisowski; 17 Feb 2022, 14:22.

    Comment


    • #3
      Thanks, William!

      Edited because I answered my own question. This worked well for me - thanks again!!

      Last edited by Cora Touchstone; 17 Feb 2022, 16:35.

      Comment

      Working...
      X