Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching male and female patients with same/similar lab values

    Hi! I'm looking into a sex-related sub question of my bachelor's thesis about heart failure and oncomarkers. Even though male/female values do not seem significantly different at first glance, I would like to create the following:

    I want to match a female patient to a male patient who have the same/similar age and same/similar lab values (creatinine and nt-proBNP). I would like to see their oncomarker values.

    Thank you in advance!

  • #2
    Joliene:
    welcome to this forum.
    Without further details about your data, my advice is to take a look at -help propensity-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you! I've looked into propensity score matching and I have gotten further. However I can only use one 'treatment independent' while I want to use 2-3.

      there is 1 propensity score less than 1.00e-05
      treatment overlap assumption has been violated; use the osample() option to identify the observations

      Is there anything I can do about this?

      Comment


      • #4
        Propensity score matching is a particular approach to matching, in which you treated with untreated subjects based on having an aggregate similarity on all variables that are predictive of being treated. So it makes no sense to speak of propensity score matching with multiple treatments. The whole idea of propensity matching is centered around a particular treatment--and its desirable statistical properties stem from that fact. You can, for each treatment, do a new propensity score match--but you can't carry the propensity score match from one treatment over to an analysis of another treatment.

        The kind of matching you asked about in #1 is a different approach to matching: you selected the matching variables a priori and not on the basis of their value as predictors of who is and isn't treated. If you want help with this approach to matching, please post back with the additional information needed:

        1. How close in age do they have to be to count as similar?
        2. How close in nt-proBNP do they have to be to count as similar?
        3. How close in creatinine do they have to be to count as similar?

        Be warned that the narrower the range you accept for matching, the harder it will be to find matches. If you make your match criteria too stringent, you will have large numbers of females for whom there is no matching male and vice versa. On the other hand, if you make the matching criteria too loose, then the matching looses its power to refine your analysis by reducing the effects of these nuisance variables.

        Also provide an example of your data using the -dataex- command. In choosing the sample to show with -dataex-, please be sure to include some males and some females, and, in particular, some that satisfy the matching criteria and some that do not.

        If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        When asking for help with code, always show example data. When showing example data, always use -dataex-.

        Comment


        • #5
          I have never used dataex before. Is this what you mean?
          Code:
           * Example generated by -dataex-. To install:	ssc	install dataex clear input double CA125UmL float age double(creat	sex	ntprocobas) 49.36  63.45517      48 1  1686 21.26  47.81656  141.44 0  2851 .  62.77071  123.76 0 18853 66.7  62.95688     107 0 16778 5.29   53.4538   79.56 0 360.8 3.03  50.27515    78.4 0 406.8 69.31  48.62149    77.4 0 125.6 .  48.16427      51 0 340.9 11.43  50.04244     122 0 10612 7.71   58.5462  87.516 0   360 288.4  63.48255     150 0 23014 19.38  62.07529   117.6 1 962.5 39.1  49.62902    88.4 0  1705 11.22  47.47707      77 0 117.8 45.98 33.054073     100 0  4924 16.9  61.04312 104.312 0 257.5 15.33  51.34291   79.56 0 957.6 693.8  61.84805     600 0 35000 12.63  63.19781 106.964 0  1055 .  40.89254   97.24 0 194.2 9.54  40.60233    88.4 0  2711 .  42.59001     159 0  4991 373.5 64.057495     125 0  5116 65.07 37.659138  106.08 0  2233 470.7  57.74127     213 0  5134 52.11  55.95346      79 1  2394 103.8  64.12868   79.56 0  1073 13.83   57.4976    78.6 0  1101 11.18  53.97673   97.24 0  1499 320.6    42.705  106.08 0 16440 98.41  51.47707 169.728 0  1722 .   54.3655  81.328 0     . 9.61  58.29158    78.2 1  2938 27.78  63.12115     121 0  3590 6.97 64.257355 104.312 0  1524 21.81  54.99795      74 0  2819 10.82  62.77892      87 0  1401 160.1  59.32101     110 0  2837 5.92  51.64682    88.4 0     . 7.2  61.75496      79 0 394.5 .  53.89733   70.72 0     . 15.06  62.72964     141 0  3567 .  56.44353      53 0  5339 104.9  63.66325   97.24 0  1246 8.63  45.69473      56 0 475.3 219.2  56.91992  141.44 0  3506 49.53  58.04791     105 0  4038 84.35  63.68241      56 0  5143 7.61  61.60164 146.744 1  2575 6.76   54.4668      61 1 295.7 8.92   57.4319      87 1  1150 175.7  43.50445      83 0  2603 138.7  47.21424   107.9 0  8781 289.1   60.1013     279 0 29639 23.61  63.56468      93 0     . 13.17  59.87132      79 0  2578 88.48  58.05339      75 0  4957 17.29  58.69678 112.268 0 848.7 19.42  51.44422    91.3 0 84.89 .  59.46886  123.76 1     . 384.2  53.14716   97.24 0  1253 36.72  55.10746      62 0 331.2 9.55   62.5462      94 1  4665 39.26  53.87269      64 1     . 570.6   57.0705      91 0  2397 173.2    57.577  85.748 0  1481 24.27  51.25257      43 1  2985 .  51.84942  91.936 0     . .  48.19986      92 0  2202 64.49  54.46407     144 0  2686 18.4  55.53456      81 0     . 359.6  40.96646     152 0  4142 162.7  62.35729  65.416 0  1208 .  63.49076      88 0  3896 1366  63.56194  159.12 0  2601 7.82  56.34771     120 0 184.1 10.94  59.26899      54 0   175 12.49  55.38672    88.4 0 41.58 10.36  58.95414     113 0  1053 12.62  62.83368      93 0 186.8 .  64.08214      95 1     . .  52.69268   79.56 0     . 17.23  63.23066      75 0  5396 11.83   61.7358  77.792 0 12755 6.11  62.93497 144.092 0 533.5 242.9  59.03354      76 1  1365 93.83 64.054756      97 0  9616 19.21  57.63997  95.472 0 699.2 28.24  35.14579   154.7 0  4121 14.04  63.54004 934.388 0 11987 15.72  56.55852      72 1  1832 .  55.24435       . 0     . 260.8  52.06571   79.56 1  1947 139.2  40.99932      84 0  1077 9.29  54.67214    95.3 0 297.7 .  58.89391      96 1  2637 13.71  43.61123      80 1 244.6 19.53  62.87474      71 0  1649 431.8 26.743326 109.616 0  4881 279  50.81451      72 1  5414 end label values sex sex label def sex 0 "male", modify label def sex 1 "female", modify

          Comment


          • #6
            Thanks for your reply by the way! I am not sure whether I can statistically answer your questions regarding the range of similarity, but lets say:
            1. same age is +/- 2 years
            2. nt-probnp is +/- 20.0
            3. creat is +/- 2.00

            Comment


            • #7
              Thank you. Somehow your -dataex- output got mangled, with everything coming out on one line. I was able to parse it manually and run it. The code below includes the -dataex- fixed up.

              There is no statistical answer to the questions I posed earlier. They represent pragmatic judgments. On the one hand the windows should be narrow enough that people within those windows are similar in a clinically meaningful sense. On the other hand, narrow windows mean fewer possible matches and more people ending up with no admissible match.

              As it turns out, there are no permissible matches in your data example with the windows you give. Just to illustrate how the code works, I have revised the age window to +/- 10 years, and the creat window to +/- 50. Those are probably too wide to be call people in those windows clinically similar. I imagine that in your full data set, you will be able to use somewhat narrower windows than that. But it is likely that the windows you propose in #6 are too strict to match an adequate number of patients in your real data. Anyway, the code segregates those window definitions into three lines of code near the middle, so it is easy enough to make the changes in just those lines and experiment to see if you can find "the sweet spot" where you get a reasonable number of matches and the windows are small enough to make clinical sense.

              To run this code you will need to install the -rangejoin- command, written by Robert Picard. It is available from SSC. To use -rangejoin- you also need the -rangestat- command, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

              In #4, I neglected to ask you to include the patient identifier variable. Such a variable is needed here. So I've just created an arbitrary one early in the code. You presumably have such a variable already. So you can delete that line of the code, and then replace all references to variable patient_id with the name of your actual patient identifier variable.

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input double CA125UmL float age double(creat sex ntprocobas)
              49.36  63.45517      48 1  1686
              21.26  47.81656  141.44 0  2851
                  .  62.77071  123.76 0 18853
               66.7  62.95688     107 0 16778
               5.29   53.4538   79.56 0 360.8
               3.03  50.27515    78.4 0 406.8
              69.31  48.62149    77.4 0 125.6
                  .  48.16427      51 0 340.9
              11.43  50.04244     122 0 10612
               7.71   58.5462  87.516 0   360
              288.4  63.48255     150 0 23014
              19.38  62.07529   117.6 1 962.5
               39.1  49.62902    88.4 0  1705
              11.22  47.47707      77 0 117.8
              45.98 33.054073     100 0  4924
               16.9  61.04312 104.312 0 257.5
              15.33  51.34291   79.56 0 957.6
              693.8  61.84805     600 0 35000
              12.63  63.19781 106.964 0  1055
                  .  40.89254   97.24 0 194.2
               9.54  40.60233    88.4 0  2711
                  .  42.59001     159 0  4991
              373.5 64.057495     125 0  5116
              65.07 37.659138  106.08 0  2233
              470.7  57.74127     213 0  5134
              52.11  55.95346      79 1  2394
              103.8  64.12868   79.56 0  1073
              13.83   57.4976    78.6 0  1101
              11.18  53.97673   97.24 0  1499
              320.6    42.705  106.08 0 16440
              98.41  51.47707 169.728 0  1722
                  .   54.3655  81.328 0     .
               9.61  58.29158    78.2 1  2938
              27.78  63.12115     121 0  3590
               6.97 64.257355 104.312 0  1524
              21.81  54.99795      74 0  2819
              10.82  62.77892      87 0  1401
              160.1  59.32101     110 0  2837
               5.92  51.64682    88.4 0     .
                7.2  61.75496      79 0 394.5
                  .  53.89733   70.72 0     .
              15.06  62.72964     141 0  3567
                  .  56.44353      53 0  5339
              104.9  63.66325   97.24 0  1246
               8.63  45.69473      56 0 475.3
              219.2  56.91992  141.44 0  3506
              49.53  58.04791     105 0  4038
              84.35  63.68241      56 0  5143
               7.61  61.60164 146.744 1  2575
               6.76   54.4668      61 1 295.7
               8.92   57.4319      87 1  1150
              175.7  43.50445      83 0  2603
              138.7  47.21424   107.9 0  8781
              289.1   60.1013     279 0 29639
              23.61  63.56468      93 0     .
              13.17  59.87132      79 0  2578
              88.48  58.05339      75 0  4957
              17.29  58.69678 112.268 0 848.7
              19.42  51.44422    91.3 0 84.89
                  .  59.46886  123.76 1     .
              36.72  55.10746      62 0 331.2
               9.55   62.5462      94 1  4665
              39.26  53.87269      64 1     .
              570.6   57.0705      91 0  2397
              173.2    57.577  85.748 0  1481
              24.27  51.25257      43 1  2985
                  .  51.84942  91.936 0     .
                  .  48.19986      92 0  2202
              64.49  54.46407     144 0  2686
               18.4  55.53456      81 0     .
              359.6  40.96646     152 0  4142
              162.7  62.35729  65.416 0  1208
                  .  63.49076      88 0  3896
               1366  63.56194  159.12 0  2601
               7.82  56.34771     120 0 184.1
              10.94  59.26899      54 0   175
              12.49  55.38672    88.4 0 41.58
              10.36  58.95414     113 0  1053
              12.62  62.83368      93 0 186.8
                  .  64.08214      95 1     .
                  .  52.69268   79.56 0     .
              17.23  63.23066      75 0  5396
              11.83   61.7358  77.792 0 12755
               6.11  62.93497 144.092 0 533.5
              242.9  59.03354      76 1  1365
              93.83 64.054756      97 0  9616
              19.21  57.63997  95.472 0 699.2
              28.24  35.14579   154.7 0  4121
              14.04  63.54004 934.388 0 11987
              15.72  56.55852      72 1  1832
                  .  55.24435       . 0     .
              260.8  52.06571   79.56 1  1947
              139.2  40.99932      84 0  1077
               9.29  54.67214    95.3 0 297.7
                  .  58.89391      96 1  2637
              13.71  43.61123      80 1 244.6
              19.53  62.87474      71 0  1649
              431.8 26.743326 109.616 0  4881
                279  50.81451      72 1  5414
              end
              label values sex sex
              label def sex 0 "male", modify
              label def sex 1 "female", modify
              
              gen long patient_id = _n    // SKIP THIS IF YOU ALREADY HAVE A patient_id VARIABLE
              
              //    MAKE A FILE OF JUST MALES
              preserve
              tempfile males
              keep if sex == 0
              save `males'
              
              //    AND NOW GET JUST THE FEMALES
              restore
              keep if sex == 1
              
              //    DEFINE WINDOW RADII
              //    YOU CAN CHANGE THE DEFINITIONS HERE 
              local age_window 10
              local ntprocobas_window 20
              local creat_window 50
              
              //    WILL FIRST JOIN RESTRICTING AGE
              rangejoin age -`age_window' `age_window' using `males'
              keep if !missing(patient_id_U)
              //    NOW RESTRICT ON NTPROCOBAS
              keep if abs(ntprocobas-ntprocobas_U) < `ntprocobas_window'
              //    NOW RESTRICT ON CREATININE
              keep if abs(creat-creat_U) < `creat_window'
              
              //    ASSUMING YOU WISH TO MATCH JUST ONE MALE TO EACH FEMALE
              set seed 1234 // OR YOUR FAVORITE RANDOM NUMBER SEED
              gen double shuffle = runiform()
              by patient_id (shuffle), sort: keep if _n == 1
              By the way, looking at the distribution of the ntprocobas variable, which is very skew, you might consider log-transforming it for purposes of matching. (Or, equivalently, for this variable base the similarity criterion on a ratio rather than a difference in the untransformed variable.)

              Comment


              • #8
                Thank you! I will log-transform nt-proBNP. I already have a patient_id variable.

                I get the following error:
                rangejoin age -`age_window' `age_window' using `males' extra argument after keyvar low high: 10

                Comment


                • #9
                  I can't reproduce the problem you are having. The code runs without difficulty on my setup:

                  Code:
                  . gen long patient_id = _n    // SKIP THIS IF YOU ALREADY HAVE A patient_id VARIABLE
                  
                  . 
                  . //    MAKE A FILE OF JUST MALES
                  . preserve
                  
                  . tempfile males
                  
                  . keep if sex == 0
                  (18 observations deleted)
                  
                  . save `males'
                  file C:\Users\CLYDES~1\AppData\Local\Temp\ST_1ed0_000002.tmp saved
                  
                  . 
                  . //    AND NOW GET JUST THE FEMALES
                  . restore
                  
                  . keep if sex == 1
                  (81 observations deleted)
                  
                  . 
                  . //    DEFINE WINDOW RADII
                  . //    YOU CAN CHANGE THE DEFINITIONS HERE 
                  . local age_window 10
                  
                  . local ntprocobas_window 20
                  
                  . local creat_window 50
                  
                  . 
                  . //    WILL FIRST JOIN RESTRICTING AGE
                  . rangejoin age -`age_window' `age_window' using `males'
                    (using rangestat version 1.1.1)
                  
                  . keep if !missing(patient_id_U)
                  (0 observations deleted)
                  
                  . //    NOW RESTRICT ON NTPROCOBAS
                  . keep if abs(ntprocobas-ntprocobas_U) < `ntprocobas_window'
                  (1,038 observations deleted)
                  
                  . //    NOW RESTRICT ON CREATININE
                  . keep if abs(creat-creat_U) < `creat_window'
                  (1 observation deleted)
                  
                  . 
                  . //    ASSUMING YOU WISH TO MATCH JUST ONE MALE TO EACH FEMALE
                  . set seed 1234 // OR YOUR FAVORITE RANDOM NUMBER SEED
                  
                  . gen double shuffle = runiform()
                  
                  . by patient_id (shuffle), sort: keep if _n == 1
                  (0 observations deleted)
                  
                  . 
                  end of do-file
                  I don't know what to tell you. Are you sure this is exactly what you ran? Did you copy/paste that directly from your do-file here? I can reproduce that error message (including the specific number 10) if I put a space between the - and the first `age_window'. Did you have an extra space there?

                  Comment


                  • #10
                    Hi Clyde,
                    As I was using Citrix to access Stata, I wasn't able to copy/paste and I indeed put a space in between. The next problem I encouter is a memory one.

                    Code:
                     rangejoin age -`age_window' `age_window' using `males'
                      (using rangestat version 1.1.1)
                    op. sys. refuses to provide memory
                    A whole different problem...

                    Comment


                    • #11
                      That's a more difficult problem to solve.

                      Apparently you're working with a very large data set (or your computer has very little memory). Given the nature of the matching you're trying to do, there isn't a natural way to split the data set into smaller segments and then match each of them separately and then put the results back together at the end. (That would work if your data set encompassed many different diagnoses and you were matching on the diagnosis: then you could do one diagnosis at a time.)

                      So there are a few things that may help. First, shut down all other applications on your computer before you run this: they compete with Stata for memory. Resist the temptation to browse the web while you're waiting for this to run: that, too, will take up memory resources that Stata may need.

                      Also, assuming your real data set contains more variables than just patient id, sex, creat, age, and ntprocobas, drop all of the other variables, drop observations with any missing values on these variables, then run -compress- and then try again.

                      Code:
                      //    TRIM DOWN THE DATA SET
                      keep patient_id sex creat age ntprocobas
                      drop if missing(patient_id, sex, creat, age, ntprocobas)
                      compress
                      That might shrink the data set enough for you to get through the matching. Then once you've got the matched pairs, you can bring back the other variables by -merge-ing.

                      If that doesn't work, you might benefit by changing the order in which the conditions for the three match variables are imposed. I just chose to do age first (with -rangejoin-) arbitrarily: I didn't anticipate you would be up against memory limits. But if you can figure out which of the matching criteria is most difficult to satisfy (i.e. eliminates the most potential matches) and do that one first, if it's sufficiently stringent, that might bring the memory burden down to what your computer can handle. The first match must always be done with -rangejoin-, and then the other two variables are handled with -keep if- commands. It is during and immediately after the -rangejoin- that the memory requirements are greatest: they are roughly proportional to the square of the size of your data set. Once you make it through -rangejoin- the memory requirements only shrink from there.

                      If doing all of the above still leaves you with inadequate memory, then I think you will have to find somebody who has a computer with more memory and Stata loaded on it who will let you run it there.

                      Comment


                      • #12
                        It all worked! I used a different computer and trimmed down the dataset. It is really big, indeed.
                        Thank you so much. After the codes you have given me, how can I visualize the results?

                        Comment


                        • #13
                          I think you need to be a little more specific about just what results you want to visualize. You now have a matched-pair data set, and there are many things you might wish to look at.

                          Comment

                          Working...
                          X