Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug found in nearmrg command (?)

    Hello everyone,

    After hours of debugging of my code, I think I found a bug in the command nearmrg that appears only if I use it with more than one group_id variable.
    I hope that some of you may help me pointing me what I am doing incorrectly and that eventually there is no bug but just my mistake.

    Here is the code to replicate the bug:

    Code:
    clear
    input int group_id long personal_id int tomatch
    96   11702  0
    96   45103  0
    96   50003  1
    96   62303  1
    96   95003  1
    96   97303  2
    96  125403  2
    96  126604  3
    96  145404  4
    96  172602  4
    96  177304  5
    96  217002  6
    96  228603  6
    96  229803  7
    96  236302  7
    96  236303  7
    96  248503  8
    96  267003  8
    96  289303  8
    96  295902  9
    96  297403  9
    96  316902 10
    96  328703 10
    96  328704 10
    96  329003 11
    96  329004 11
    96  329504 11
    96  348703 12
    96  349304 13
    96  349503 14
    96  354803 15
    96  365003 15
    96  365004 15
    96  383804 16
    96  395002 17
    96  425203 18
    96  440403 18
    96  441603 18
    96  451503 19
    96  486404 20
    96  488404 20
    96  496604 20
    96  504303 20
    96  504804 21
    96  509403 21
    96  547303 22
    96  562903 23
    96  566604 23
    96  573904 23
    96  585303 23
    96  587304 24
    96  588504 26
    96  596102 26
    96  619903 27
    96  663205 28
    96  676605 28
    96  683403 29
    96  698203 29
    96  704502 30
    96  710404 30
    96  719003 31
    96  730904 32
    96  746702 32
    96  788903 33
    96  810003 34
    96  817404 34
    96  824203 35
    96  831802 36
    96  843304 36
    96  873503 37
    96  878103 37
    96  878104 38
    96  922603 38
    96  927802 39
    96  928804 40
    96  956903 40
    96  958002 42
    96  958003 43
    96  972303 44
    96  977702 45
    96  988902 46
    96  991303 46
    96  994305 47
    96 1006003 48
    96 1039904 48
    96 1049503 48
    96 1080304 49
    96 1112704 49
    96 1118703 50
    96 1118705 50
    96 1131504 50
    96 1132904 51
    96 1153902 51
    96 1159303 51
    96 1161904 52
    96 1170903 52
    96 1170904 52
    96 1186602 53
    96 1188404 54
    96 1204004 54
    96 1208603 54
    96 1215903 55
    96 1223603 55
    96 1264703 55
    96 1269602 56
    96 1270704 56
    96 1280903 57
    96 1284904 57
    96 1310403 58
    96 1314804 58
    96 1322902 58
    96 1326903 59
    96 1337603 59
    96 1338103 59
    96 1358404 60
    96 1374503 60
    96 1378203 61
    96 1391604 61
    96 1408603 62
    96 1412503 63
    96 1425803 63
    96 1456803 64
    96 1485403 65
    96 1493903 65
    96 1517104 66
    96 1527503 66
    96 1530303 67
    96 1537003 67
    96 1582502 68
    96 1593102 68
    96 1595505 68
    96 1665603 69
    96 1674602 69
    96 1692703 69
    96 1783704 70
    96 1792603 71
    96 1805503 72
    96 1823203 72
    96 1830103 72
    96 1884203 73
    96 1890104 74
    96 1896604 74
    96 1900502 75
    96 1920302 75
    96 1933802 76
    96 1943503 76
    96 1963603 77
    96 1967304 77
    96 1967404 78
    96 1971603 78
    96 2012003 78
    96 2012004 79
    96 2018904 79
    96 2022403 79
    96 2032703 80
    96 2043803 80
    96 2049204 80
    96 2066604 81
    96 2074803 81
    96 2097203 82
    96 2111603 83
    96 2129403 83
    96 2147803 85
    96 2154803 85
    96 2214004 86
    96 2243403 87
    96 2257603 87
    96 2281303 88
    96 2283602 88
    96 2285702 89
    96 2289702 89
    96 2301304 91
    96 2319803 91
    96 2334003 92
    96 2342202 92
    96 2349404 92
    96 2375003 92
    96 2376303 93
    96 2390704 93
    96 2408404 93
    96 2415802 93
    96 2420803 94
    96 2424905 94
    96 2448002 95
    96 2453904 96
    96 2462603 96
    96 2478602 97
    96 2481202 97
    96 2483102 97
    96 2519302 97
    96 2626703 98
    96 2655003 98
    96 2669202 99
    96 2683203 99
    98  103403 14
    98  246303 22
    98  613804 29
    98  640802 32
    98  985202 43
    98 1169103 57
    98 1207802 64
    98 1328702 68
    98 1362303 70
    98 1551005 72
    98 1880902 74
    98 2031002 75
    98 2193403 76
    98 2450802 82
    98 2700503 99
    end
    
    tempfile data1
    save `data1'
    
    clear
    input int(group_id tomatch)
    96 14
    96 19
    96 32
    96 39
    96 51
    96 86
    96 96
    96 99
    98 99
    end
    
    tempfile data2 
    save `data2'
    
    use `data1', clear
    
    nearmrg group_id using `data2', near(tomatch)  genmatch(tomatch_data2) type(m:1)
    
    order group_id tomatch tomatch_data2
    sort group_id tomatch
    if you scroll down looking at the "tomatch" variable you can see that for some istances it is undoubtely matched with the wrong original data. Another issue is that there is some source of randomness so the mismatching is not always the same and if you run the code multiple times you will end up with a different mismatch everytime!
    This is an example of what I mean by mismatch: the tomatch_data2 variable should have a 39 when tomatch is 39 but oddly enough it gets 32.
    group_id tomatch tomatch_data2
    96 38 39
    96 39 32
    96 40 39

    Obviously this is causing me a lot of pain, maybe someone can point me out what I am doing wrong? Or another command to perform a nearest matching between data?

    Any help, of cours, would be greatly appreciated.
    All the best,
    D.

  • #2
    I have duplicated your experience, except that where you show 39 matched to 32, I show 14 matched to 19 and 19 matched to 14.

    I suggest you contact the current author at the email address given in the output of help nearmrg.

    Below demonstrates technique to do what you seek using your example data and the community-contributed rangejoin command available from SSC.
    Code:
    /* these need to be installed
    ssc install rangejoin
    ssc install rangestat
    */
    
    use `data2', clear
    isid group_id tomatch
    bysort group_id (tomatch): generate low  = (tomatch+tomatch[_n-1])/2
    bysort group_id (tomatch): generate high = (tomatch+tomatch[_n+1])/2-.01
    rename tomatch tomatch_ref
    list, clean abbreviate(16)
    rangejoin tomatch low high using `data1', by(group_id)
    drop low high
    order tomatch_ref, last
    sort group_id tomatch
    
    egen pair = tag(group_id tomatch tomatch_ref)
    list group_id tomatch tomatch_ref if pair, sepby(tomatch_ref) noobs abbreviate(16)
    Code:
    . /* these need to be installed
    > ssc install rangejoin
    > ssc install rangestat
    > */
    . 
    . use `data2', clear
    
    . isid group_id tomatch
    
    . bysort group_id (tomatch): generate low  = (tomatch+tomatch[_n-1])/2
    (2 missing values generated)
    
    . bysort group_id (tomatch): generate high = (tomatch+tomatch[_n+1])/2-.01
    (2 missing values generated)
    
    . rename tomatch tomatch_ref
    
    . list, clean abbreviate(16)
    
           group_id   tomatch_ref    low    high  
      1.         96            14      .   16.49  
      2.         96            19   16.5   25.49  
      3.         96            32   25.5   35.49  
      4.         96            39   35.5   44.99  
      5.         96            51     45   68.49  
      6.         96            86   68.5   90.99  
      7.         96            96     91   97.49  
      8.         96            99   97.5       .  
      9.         98            99      .       .  
    
    . rangejoin tomatch low high using `data1', by(group_id)
      (using rangestat version 1.1.1)
    
    . drop low high
    
    . order tomatch_ref, last
    
    . sort group_id tomatch
    
    . 
    . egen pair = tag(group_id tomatch tomatch_ref)
    
    . list group_id tomatch tomatch_ref if pair, sepby(tomatch_ref) noobs abbreviate(16)
    
      +----------------------------------+
      | group_id   tomatch   tomatch_ref |
      |----------------------------------|
      |       96         0            14 |
      |       96         1            14 |
      |       96         2            14 |
      |       96         3            14 |
      |       96         4            14 |
      |       96         5            14 |
      |       96         6            14 |
      |       96         7            14 |
      |       96         8            14 |
      |       96         9            14 |
      |       96        10            14 |
      |       96        11            14 |
      |       96        12            14 |
      |       96        13            14 |
      |       96        14            14 |
      |       96        15            14 |
      |       96        16            14 |
      |----------------------------------|
      |       96        17            19 |
      |       96        18            19 |
      |       96        19            19 |
      |       96        20            19 |
      |       96        21            19 |
      |       96        22            19 |
      |       96        23            19 |
      |       96        24            19 |
      |----------------------------------|
      |       96        26            32 |
      |       96        27            32 |
      |       96        28            32 |
      |       96        29            32 |
      |       96        30            32 |
      |       96        31            32 |
      |       96        32            32 |
      |       96        33            32 |
      |       96        34            32 |
      |       96        35            32 |
      |----------------------------------|
      |       96        36            39 |
      |       96        37            39 |
      |       96        38            39 |
      |       96        39            39 |
      |       96        40            39 |
      |       96        42            39 |
      |       96        43            39 |
      |       96        44            39 |
      |----------------------------------|
      |       96        45            51 |
      |       96        46            51 |
      |       96        47            51 |
      |       96        48            51 |
      |       96        49            51 |
      |       96        50            51 |
      |       96        51            51 |
      |       96        52            51 |
      |       96        53            51 |
      |       96        54            51 |
      |       96        55            51 |
      |       96        56            51 |
      |       96        57            51 |
      |       96        58            51 |
      |       96        59            51 |
      |       96        60            51 |
      |       96        61            51 |
      |       96        62            51 |
      |       96        63            51 |
      |       96        64            51 |
      |       96        65            51 |
      |       96        66            51 |
      |       96        67            51 |
      |       96        68            51 |
      |----------------------------------|
      |       96        69            86 |
      |       96        70            86 |
      |       96        71            86 |
      |       96        72            86 |
      |       96        73            86 |
      |       96        74            86 |
      |       96        75            86 |
      |       96        76            86 |
      |       96        77            86 |
      |       96        78            86 |
      |       96        79            86 |
      |       96        80            86 |
      |       96        81            86 |
      |       96        82            86 |
      |       96        83            86 |
      |       96        85            86 |
      |       96        86            86 |
      |       96        87            86 |
      |       96        88            86 |
      |       96        89            86 |
      |----------------------------------|
      |       96        91            96 |
      |       96        92            96 |
      |       96        93            96 |
      |       96        94            96 |
      |       96        95            96 |
      |       96        96            96 |
      |       96        97            96 |
      |----------------------------------|
      |       96        98            99 |
      |       96        99            99 |
      |       98        14            99 |
      |       98        22            99 |
      |       98        29            99 |
      |       98        32            99 |
      |       98        43            99 |
      |       98        57            99 |
      |       98        64            99 |
      |       98        68            99 |
      |       98        70            99 |
      |       98        72            99 |
      |       98        74            99 |
      |       98        75            99 |
      |       98        76            99 |
      |       98        82            99 |
      |       98        99            99 |
      +----------------------------------+

    Comment


    • #3
      Thank you William,
      That is a nice workaround. I'll try to implement it.
      In the meanwhile I emailed the current maintainer of nearmrg. I'll post here if I receive an answer from him.
      Best,
      D.

      Comment

      Working...
      X