Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to match variables for analysis of matched pairs

    Hi Statalist,
    Please excuse my ignorance in this post, I am a statistical newbie just getting off the ground with come clinical research and attempting to do a preliminary analysis.

    I have data on baseline medical history and outcomes after a procedure. What I'd like to do is match patients based on the presence of a baseline illness (chronic kidney disease) then perform a conditional logistic regression to determine if the presence of other baseline illnesses (anemia) resulted in differing outcomes (acute kidney injury).

    I've spent hours reading through the forum and watching youtube videos (and read Allan Acock's book) but am having trouble matching the variables. Have tried the joinby command and tried [egen match = group(ckd)], and a few others. If someone could point me in the right direction I would really appreciate it.

    Best,
    J

  • #2
    No need to apologize for being a newbie. We all were at one point.

    Since you have already spent hours reading and watching videos, it is unlikely that writing a paragraph explaining the kinds of commands that are generally used for matching will be helpful. Better would be to show you the actual code that will accomplish it in your data. But that depends on how your data are set up. So you need to post back and use the -dataex- command to post a representative example (containing both observations with and without CKD, and with and without anemia). If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Now, you won't want to include any identifying data in that example. So if our individual study subjects are identified by a medical record number or something like that, you should create a pseudo-identifier. -egen pseudo_id = group(mrn)- (where mrn is replaced by the actual name of the medical record number variable or whatever the identifier is) and include the pseudo_id but not the medical record number in the example you show. Since things like age are apparently not being used in the matching, don't include them either.

    You will also need to clarify a few things about the kind of matching you want to do. Do you want to match 1:1? Or do you want multiple patients with CKD for each non-CKD patient? Or the other way around? If multiple, how many? Another important thing you need to say is whether you want matching with or without replacement. (In matching without replacement, the same person cannot serve as a control for more than one case, whereas in matching with replacement they can. There is no statistical reason to prefer matching without replacement, and matching with replacement is easier to code. But some people have a strong aesthetic preference for matching without replacement.)

    Comment


    • #3
      Thank you so much. I appreciate your help and your time in doing so.

      I'll be matching probably 1:1 because my numbers with and without CKD are roughly even (36 with, 41 without). And matching without replacement may be preferable because the outcomes even in my small sample are rare, but if there isn't much statistical reason for me to prefer matching without replacement then I'll definitely defer to you.

      Below is the dataex where
      hxckd = history of ckd
      hxanemia = history of anemia
      anemia = outcome of anemia (there are a few different outcomes Id like to assess, but for purposes here this can be the outcome variable)


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(pseudo_id hxckd hxanemia anemia)
       1 0 1 0
       2 1 1 0
       3 0 0 1
       4 1 1 0
       5 1 1 0
       6 1 1 0
       7 1 0 1
       8 0 1 1
       9 0 0 0
      10 0 1 0
      11 0 0 0
      12 0 0 1
      13 1 1 0
      14 1 1 0
      15 0 0 1
      16 1 1 0
      17 0 0 1
      18 0 0 1
      19 1 0 0
      20 1 0 1
      21 0 1 0
      22 1 0 0
      23 1 0 0
      24 0 0 0
      25 0 1 0
      26 0 1 0
      27 0 0 0
      28 1 1 0
      29 0 0 0
      30 1 1 0
      31 0 1 0
      32 0 1 0
      33 0 0 1
      34 1 1 0
      35 0 1 0
      36 1 0 0
      37 0 1 0
      38 1 0 0
      39 1 1 1
      40 1 0 0
      41 1 1 0
      42 1 1 0
      43 1 0 0
      44 1 0 1
      45 1 1 0
      46 0 0 0
      47 1 1 0
      48 0 0 1
      49 0 0 0
      50 0 0 0
      51 1 0 0
      52 1 1 0
      53 0 0 0
      54 1 1 0
      55 0 1 0
      56 0 1 0
      57 1 1 0
      58 . 1 0
      59 1 1 0
      60 0 0 0
      61 0 0 0
      62 1 1 0
      63 0 0 1
      64 0 0 0
      65 1 1 1
      66 0 1 0
      67 1 0 0
      68 0 0 0
      69 0 0 0
      70 0 0 0
      71 0 1 1
      72 0 1 0
      73 1 0 0
      74 1 1 0
      75 1 1 0
      76 0 0 0
      77 0 1 1
      78 0 0 0
      end
      label values hxanemia standard
      label values hxckd standard
      label values anemia standard
      label def standard 0 "no", modify
      label def standard 1 "yes", modify

      Comment


      • #4
        Code:
        gen double shuffle = runiform()
        sort shuffle
        by hxckd (shuffle), sort: gen long priority = _n
        drop shuffle
        tempfile controls
        save `controls'
        restore
        
        // ISOLATE THE HX ANEMIA OBSERVATIONS (CASES)
        keep if hxanemia
        drop hxanemia
        ds hxckd, not
        rename (`r(varlist)') =_case
        gen double shuffle = runiform()
        sort shuffle
        by hxckd (shuffle), sort: gen long priority = _n
        drop shuffle
        
        //  NOW CREATE MATCHED PAIRS ON hsckd
        merge 1:1 hxckd priority using `controls'
        
        //  REORGANIZE THE DATA TO LINK CASES WITH MATCHED CONTROLS BY A COMMON
        //  PAIR ID, BUT IN SEPARATE OBSERVATIONS
        gen long pair_num = _n
        ds *_case
        local stubs `r(varlist)'
        local stubs: subinstr local stubs "_case" "", all
        reshape long `stubs', i(pair_num) j(case_ctrl) string
        drop if missing(pseudo_id)
        label define cc 0 "_ctrl" 1 "_case"
        encode case_ctrl, gen(cc) label(cc)
        drop case_ctrl
        The above will, to the extent possible, assign each patient with hxanemia == 1 (cases) a control (hxanemia = 0) having the same value of hxckd, and no patient will serve as a control to more than one case. The paired cases share a common value of the new variable pair_num. Because the numbers of hxckd patients in the cases and controls differ, not every case could be assigned a matched pair, and, similarly, some other potential controls were leftover as matches to nobody.

        Comment


        • #5
          Thank you so much! This is fantastic. I really appreciate you laying this all out

          Comment

          Working...
          X