Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Case control matching with age, gender and BMI

    Hi
    I am trying to match data by gender, age range +/- 5 years and BMI +/- 3. With the code below it matches but it is including BMI values outside the +/- 3 range for some matches. Could some see what is wrong with this code? Thanks


    clear

    ** creating matched data for age (+/- 5), gender(exact match) and BMI (-/+ 3)

    ******************************Data preparation task********************************************** ********

    use "F:\OSA data\Latestcode\AllwithOSAdata_191121latest.dta"

    ** create cases subset
    keep if id_casecntrl==1
    keep if flag==1
    rename id id_case
    save "F:\OSA data\Latestcode\Cases1.dta", replace

    ** create controls subset
    use "F:\OSA data\Latestcode\AllwithOSAdata_191121latest.dta"
    keep if id_casecntrl==2
    rename id id_cntl
    save "F:\OSA data\Latestcode\Controls1.dta", replace

    gen rand = runiform()
    sort rand
    drop rand

    save "F:\OSA data\Latestcode\Controls2.dta", replace

    *rename * *_cntl
    *rename id_cntl id
    *duplicates drop id, force
    *save "C:\Users\venka\Desktop\NSWHealth\Venkatesha - Consults\1970 - Premala Sureshkumar\Controls3.dta", replace

    ******************************End of Data preparation task********************************************** ********


    *Read the cases data file. Replace the file path of the data set appropraitely in the program
    use "F:\OSA data\Latestcode\Cases1.dta"

    * matching (exact) on Gender, within +/- 5 years for age
    compress
    rangejoin ageatvisit -5 5 using "F:\OSA data\Latestcode\Controls2.dta", by (gender)

    order id_case id_cntl gender ageatvisit
    drop *_U

    gen rand = runiform()
    sort rand
    drop rand

    *rename *_U *_cntl
    *rename id id_cases
    *sort id_cases
    *drop if id_casecntrl_cntl==.

    *use matched control only twice for each matched case(preserving 1:2 case : control ratio)
    *bysort id_cases: keep if _n <= 2

    *Check how many controls were found for every case
    *bysort id_cases: gen byte numcontrols = _N if _n == 1
    *tab numcontrols
    *drop if numcontrols == 1
    *drop numcontrols

    ** Matching on age and gender is complete.

    *rename id_cntl id
    *drop *_cntl

    *gen rand = runiform()
    *sort rand
    *drop rand

    * matching within +/- 3 units of BMI

    rangejoin bmi -3 3 using "F:\OSA data\Latestcode\Controls2.dta", by (id_cntl)
    drop if ageatvisit_U==.
    drop if gender_U==""

    order id_case id_cntl gender gender_U ageatvisit ageatvisit_U bmi bmi_U

    drop *_U
    *sort id_case

    *use matched control only twice for each matched case(preserving 1:2 case : control ratio)

    bysort id_case id_cntl: keep if _n == 1
    bysort id_case: keep if _n <= 2

    *Check how many controls were found for every case
    bysort id_case: gen byte numcontrols = _N if _n ==1
    tab numcontrols
    drop if numcontrols == 1
    drop numcontrols

    rename * *_case
    rename (id_case_case id_cntl_case) (id_case id_cntl)

    *drop *_U
    *rename * *_case
    *rename (id_cases_case id_case) (id_case id)

    save "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI0.dta", replace


    use "F:\OSA data\Latestcode\Controls2.dta"

    rename * *_cntl
    rename id_cntl_cntl id_cntl
    duplicates drop id_cntl, force

    save "F:\OSA data\Latestcode\Controls3.dta", replace

    use "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI0.dta"

    merge m:m id_cntl using "F:\OSA data\Latestcode\Controls3.dta"

    order id_case id_cntl gender_case gender_cntl ageatvisit_case ageatvisit_cntl bmi_case bmi_cntl
    drop if id_case==""
    drop _merge

    bysort id_case id_cntl: keep if _n == 1

    *Check how many controls were found for every case
    bysort id_case: gen byte numcontrols = _N if _n ==1
    tab numcontrols
    drop if numcontrols == 1
    drop numcontrols
    sort id_case

    order id_case id_cntl gender_case gender_cntl ageatvisit_case ageatvisit_cntl bmi_case bmi_cntl

    save "F:\OSA data\Latestcode\MatchedData_08December\Matched_Age GenderBMI1.dta", replace
    ** Matching on age,gender and BMI is complete.

  • #2
    Your code is very long and complicated. The fact that a large number of lines are actually commented out but look like active code makes it even harder to follow. That there is no example data shown makes it almost impossible to see what is happening. It appears you want to do 2:1 matching of controls with cases, with matching on gender and caliper matching on age and BMI. It is less clear whether you want controls to be sampled with or without replacement. Since the code for sampling with replacement is simpler, I will show you how I would approach this with replacement.

    Code:
    //  CREATE DATA SET TO DEMONSTRATE THE APPROACH
    clear*
    set obs 500
    set seed 1234
    gen int id = _n
    label define case_control   1   "Case"  0   "Control"
    gen byte case_control:case_control = (_n <= 100)
    label define sex    0   "Male"  1   "Female"
    gen byte sex:sex = runiformint(0, 1)
    gen bmi = rgamma(15, 2)
    gen age = rnormal(50, 10)
    
    //  MATCHING PROCESS STARTS HERE
    preserve
    keep if case_control == "Control":case_control
    drop case_control
    tempfile controls
    save `controls'
    
    restore
    keep if case_control == "Case":case_control
    drop case_control
    rangejoin age -5 5 using `controls', by(sex)
    keep if inrange(bmi_U - bmi, -3, 3)
    gen double shuffle = runiform()
    by id (shuffle), sort: keep if _n <= 2
    drop shuffle
    rename (id-age) =_case
    rename *_U *_ctrl
    
    assert abs(age_case - age_ctrl) <= 5
    assert abs(bmi_case - bmi_ctrl) <= 3
    A few things to note: even though there is a considerable surplus of controls over cases in this made-up data, the matching requirements are so strict that with the distributions of age and BMI (which are fairly realistic for many types of populations) we are unable to actually find 2 matches for a lot of controls, and a handful actually find no match at all. Perhaps in your real data the number of available controls will be large enough, of the ranges of age and BMI restricted enough, that you won't encounter that problem. But be prepared for it.

    One aside: I do not know if it is related to the mismatched BMIs you are experiencing in your code, but I can tell you that the -merge m:m- command you are using is just wrong. -merge m:m- is almost always wrong, and I would be astonished if it played any legitimate role in your situation. More likely what you want to do there is accomplished by -joinby-. -merge m:m- just produces a strange kind of data salad that is almost never useful.

    In the future, when asking for help troubleshooting code, always show example data, and use the -dataex- command for that purpose. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.
    Last edited by Clyde Schechter; 27 Feb 2022, 20:52.

    Comment


    • #3
      Thanks Clyde for the code. The 'preserve', 'restore' functions are not running. It could not restore. I have now attached an example dataset. Could you please see if it runs. Also note within cases (1), I am trying to match with those who have a 'flag==1'. Thanks.

      input str30 id double(id_casecntrl flag ageatvisit) str24 gender double(gender_cat bmi)
      "1147" 1 . 55.709787816563995 "Male" 1 24
      "1147" 1 1 58.59000684462697 "Male" 1 23.3
      "1149" 2 . 30.015058179329227 "Female" 2 22.3
      "1054" 2 . 51.624914442162904 "Male" 1 34.2
      "1234" 2 . 45.71663244353183 "Male" 1 32.4
      "1187" 2 . 38.01505817932922 "Female" 2 20.8
      "1187" 2 . 40.40246406570842 "Female" 2 20.8
      "1097" 2 . 59.08555783709788 "Female" 2 29.9
      "1097" 2 . 60.2984257357974 "Female" 2 30.9
      "1097" 2 . 59.66598220396988 "Female" 2 36
      "1097" 2 . 57.667351129363446 "Female" 2 32.1
      "1122" 2 . 46.60917180013689 "Female" 2 40.3
      "1122" 2 . 45.93292265571527 "Female" 2 40.1
      "1167" 2 . 29.229295003422312 "Male" 1 27.8
      "1187" 2 . 60.785763175906915 "Male" 1 35.7
      "1129" 2 . 43.26899383983573 "Male" 1 32.7
      "1129" 2 . 44.74469541409993 "Male" 1 30.6
      "1129" 2 . 40.77754962354552 "Male" 1 35.6
      "1129" 2 . 41.275838466803556 "Male" 1 32.8
      "1129" 2 . 46.80355920602327 "Male" 1 43
      "1143" 2 . 27.86858316221766 "Male" 1 28.3
      "1143" 2 . 31.917864476386036 "Male" 1 29.8
      "1143" 2 . 32.53661875427789 "Male" 1 27.3
      "1156" 2 . 31.496235455167692 "Male" 1 24.8
      "1156" 2 . 33.6290212183436 "Male" 1 26.2
      "1135" 2 . 45.724845995893226 "Female" 2 29.4
      "1165" 2 . 58.392881587953454 "Female" 2 46.7
      "1174" 2 . 57.16632443531827 "Female" 2 44.9
      "1174" 2 . 59.42778918548939 "Female" 2 45.1
      "1176" 2 . 66.43668720054757 "Female" 2 25.3
      "1176" 2 . 67.7015742642026 "Female" 2 28
      "1207" 2 . 55.54551676933607 "Female" 2 46.9
      "1207" 2 . 52.5886379192334 "Female" 2 44.8
      "1345" 2 . 51.7700205338809 "Male" 1 29.2
      "1345" 2 . 49.87268993839836 "Male" 1 28.1
      "1245" 2 . 54.35728952772074 "Male" 1 28.1
      "1556" 2 . 50.86926762491444 "Male" 1 28.6
      "1559" 2 . 29.579739904175224 "Male" 1 40.4
      "1365" 2 . 31.89596167008898 "Female" 2 24.9
      "1432" 2 . 43.19233401779603 "Female" 2 30
      "1345" 2 . 32.33949349760438 "Female" 2 45.6
      "1345" 2 . 31.89869952087611 "Female" 2 48.4
      "1345" 2 . 33.661875427789184 "Female" 2 45
      "1564" 2 . 35.73169062286105 "Male" 1 18.8
      "1564" 2 . 40.75838466803559 "Male" 1 19.8
      "1564" 2 . 38.72142368240931 "Male" 1 19.1
      "1456" 2 . 44.57494866529774 "Male" 1 30.6
      "1456" 2 . 34.00958247775496 "Male" 1 36.4
      "1400" 2 . 31.78097193702943 "Male" 1 33.1
      "1400" 2 . 30.496919917864478 "Male" 1 33.3
      "1376" 2 . 44.26009582477755 "Male" 1 30.5
      "1376" 2 . 45.75496235455168 "Male" 1 30.8
      "1389" 2 . 50.83093771389459 "Female" 2 45.9
      "1478" 1 1 49.256673511293634 "Male" 1 18
      "1354" 2 . 37.80698151950719 "Female" 2 32.3
      "4242" 2 . 36.733744010951405 "Female" 2 33.4
      "1503" 2 . 41.14715947980835 "Male" 1 30.6
      "1503" 2 . 42.143737166324435 "Male" 1 31.7
      "1503" 2 . 40.13141683778234 "Male" 1 30.5
      "1522" 1 1 41.333333333333336 "Female" 2 42.8
      "1524" 2 . 44.15058179329227 "Female" 2 42.7
      "1501" 1 . 43.134839151266256 "Female" 2 42.4
      "1501" 1 1 45.12799452429842 "Female" 2 39.7
      "1501" 1 . 41.73305954825462 "Female" 2 37.4
      "1115" 2 . 41.11978097193703 "Female" 2 35.5
      "1115" 2 . 42.23134839151266 "Female" 2 36.1
      "1683" 2 . 57.67830253251198 "Female" 2 26.1
      "1686" 2 . 60.591375770020534 "Female" 2 28.5
      "1575" 2 . 59.59479808350445 "Female" 2 26.6
      "1522" 2 . 61.58795345653662 "Female" 2 29.6
      "1455" 2 . 58.559890485968516 "Female" 2 26.9
      "1573" 2 . 51.318275154004105 "Female" 2 28.9
      "1489" 2 . 43.266255989048595 "Male" 1 45.3
      "1522" 2 . 45.68925393566051 "Male" 1 35.8
      "1955" 1 1 47.337440109514034 "Male" 1 35.9
      "1953" 2 . 21.697467488021903 "Male" 1 24.9
      "1455" 2 . 53.59342915811088 "Male" 1 38
      "1455" 2 . 52.11772758384668 "Male" 1 26.3
      "1455" 2 . 55.16495550992471 "Male" 1 26.4
      "1664" 2 . 30.90759753593429 "Male" 1 29.9
      "1653" 2 . 31.616700889801507 "Male" 1 35
      "1654" 2 . 41.908281998631075 "Male" 1 23.6
      "1600" 2 . 40.79671457905544 "Male" 1 18.7
      "1001" 2 . 42.90485968514716 "Male" 1 23.8
      "1600" 1 . 41.00479123887748 "Male" 1 34.6
      "1600" 1 . 38.234086242299796 "Male" 1 52.1
      "1600" 1 1 41.28678986995209 "Male" 1 44.3
      "1544" 1 . 39.460643394934976 "Male" 1 46.1
      "1604" 2 . 47.37303216974675 "Female" 2 32.5
      "1800" 1 . 29.16084873374401 "Male" 1 28.2
      "1822" 1 . 25.456536618754278 "Male" 1 29.3
      "1865" 1 1 30.50239561943874 "Male" 1 25.4
      "1644" 2 . 53.275838466803556 "Female" 2 30.4
      "1500" 2 . 45.61259411362081 "Male" 1 32
      "1547" 2 . 42.37371663244353 "Male" 1 29.8
      "1573" 2 . 37.73579739904175 "Male" 1 32.1
      "1224" 2 . 41.223819301848046 "Male" 1 31.8
      "1653" 2 . 38.86652977412731 "Male" 1 33.3
      "1453" 2 . 37.14168377823409 "Male" 1 31
      "1354" 2 . 37.10335386721424 "Male" 1 34
      end

      Comment


      • #4
        On a side note, if there is an underlying cohort in which the case-control study is nested you should also ensure that your controls have at least the same amount of follow-up/analysis time as your cases (density sampling). The user written command sttocc will do this for you.

        Comment


        • #5
          Well, something is wrong with your data. You have multiple observations with the same id. And these are manifestly not the same person as age, gender and bmi all exhibit inconsistencies within the same id. So you have to fix that problem before you can even think about matching.

          As for -preserve- and -restore- "not working," I suspect you are running the code line-by-line to see what it does step-by-step. While that's commendable from a learning perspective, it is incompatible with the use of -preserve- and -restore-. When you run a single line in a do-file, Stata treats that line as a separate do-file. And the rules of -preserve- when run from a do-file (including this kind of "separate do-file") is that the preserved data gets restored at the end of that do-file (even if there is no explicit -restore- command). So in code with -preserve- and -restore-, everything from -preserve- through -restore- must be run in one fell swoop.

          Comment

          Working...
          X