How to match variables for analysis of matched pairs

Json Ochs

Join Date: Apr 2022

Posts: 5
#1

How to match variables for analysis of matched pairs

13 Apr 2022, 11:02

Hi Statalist,
Please excuse my ignorance in this post, I am a statistical newbie just getting off the ground with come clinical research and attempting to do a preliminary analysis.

I have data on baseline medical history and outcomes after a procedure. What I'd like to do is match patients based on the presence of a baseline illness (chronic kidney disease) then perform a conditional logistic regression to determine if the presence of other baseline illnesses (anemia) resulted in differing outcomes (acute kidney injury).

I've spent hours reading through the forum and watching youtube videos (and read Allan Acock's book) but am having trouble matching the variables. Have tried the joinby command and tried [egen match = group(ckd)], and a few others. If someone could point me in the right direction I would really appreciate it.

Best,
J
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

13 Apr 2022, 11:50

No need to apologize for being a newbie. We all were at one point.

Since you have already spent hours reading and watching videos, it is unlikely that writing a paragraph explaining the kinds of commands that are generally used for matching will be helpful. Better would be to show you the actual code that will accomplish it in your data. But that depends on how your data are set up. So you need to post back and use the -dataex- command to post a representative example (containing both observations with and without CKD, and with and without anemia). If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

Now, you won't want to include any identifying data in that example. So if our individual study subjects are identified by a medical record number or something like that, you should create a pseudo-identifier. -egen pseudo_id = group(mrn)- (where mrn is replaced by the actual name of the medical record number variable or whatever the identifier is) and include the pseudo_id but not the medical record number in the example you show. Since things like age are apparently not being used in the matching, don't include them either.

You will also need to clarify a few things about the kind of matching you want to do. Do you want to match 1:1? Or do you want multiple patients with CKD for each non-CKD patient? Or the other way around? If multiple, how many? Another important thing you need to say is whether you want matching with or without replacement. (In matching without replacement, the same person cannot serve as a control for more than one case, whereas in matching with replacement they can. There is no statistical reason to prefer matching without replacement, and matching with replacement is easier to code. But some people have a strong aesthetic preference for matching without replacement.)
Comment

Json Ochs

Join Date: Apr 2022
Posts: 5

16 Apr 2022, 14:48

Thank you so much. I appreciate your help and your time in doing so.

I'll be matching probably 1:1 because my numbers with and without CKD are roughly even (36 with, 41 without). And matching without replacement may be preferable because the outcomes even in my small sample are rare, but if there isn't much statistical reason for me to prefer matching without replacement then I'll definitely defer to you.

Below is the dataex where
hxckd = history of ckd
hxanemia = history of anemia
anemia = outcome of anemia (there are a few different outcomes Id like to assess, but for purposes here this can be the outcome variable)

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(pseudo_id hxckd hxanemia anemia)
 1 0 1 0
 2 1 1 0
 3 0 0 1
 4 1 1 0
 5 1 1 0
 6 1 1 0
 7 1 0 1
 8 0 1 1
 9 0 0 0
10 0 1 0
11 0 0 0
12 0 0 1
13 1 1 0
14 1 1 0
15 0 0 1
16 1 1 0
17 0 0 1
18 0 0 1
19 1 0 0
20 1 0 1
21 0 1 0
22 1 0 0
23 1 0 0
24 0 0 0
25 0 1 0
26 0 1 0
27 0 0 0
28 1 1 0
29 0 0 0
30 1 1 0
31 0 1 0
32 0 1 0
33 0 0 1
34 1 1 0
35 0 1 0
36 1 0 0
37 0 1 0
38 1 0 0
39 1 1 1
40 1 0 0
41 1 1 0
42 1 1 0
43 1 0 0
44 1 0 1
45 1 1 0
46 0 0 0
47 1 1 0
48 0 0 1
49 0 0 0
50 0 0 0
51 1 0 0
52 1 1 0
53 0 0 0
54 1 1 0
55 0 1 0
56 0 1 0
57 1 1 0
58 . 1 0
59 1 1 0
60 0 0 0
61 0 0 0
62 1 1 0
63 0 0 1
64 0 0 0
65 1 1 1
66 0 1 0
67 1 0 0
68 0 0 0
69 0 0 0
70 0 0 0
71 0 1 1
72 0 1 0
73 1 0 0
74 1 1 0
75 1 1 0
76 0 0 0
77 0 1 1
78 0 0 0
end
label values hxanemia standard
label values hxckd standard
label values anemia standard
label def standard 0 "no", modify
label def standard 1 "yes", modify

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29956

16 Apr 2022, 16:17

Code:

gen double shuffle = runiform()
sort shuffle
by hxckd (shuffle), sort: gen long priority = _n
drop shuffle
tempfile controls
save `controls'
restore

// ISOLATE THE HX ANEMIA OBSERVATIONS (CASES)
keep if hxanemia
drop hxanemia
ds hxckd, not
rename (`r(varlist)') =_case
gen double shuffle = runiform()
sort shuffle
by hxckd (shuffle), sort: gen long priority = _n
drop shuffle

//  NOW CREATE MATCHED PAIRS ON hsckd
merge 1:1 hxckd priority using `controls'

//  REORGANIZE THE DATA TO LINK CASES WITH MATCHED CONTROLS BY A COMMON
//  PAIR ID, BUT IN SEPARATE OBSERVATIONS
gen long pair_num = _n
ds *_case
local stubs `r(varlist)'
local stubs: subinstr local stubs "_case" "", all
reshape long `stubs', i(pair_num) j(case_ctrl) string
drop if missing(pseudo_id)
label define cc 0 "_ctrl" 1 "_case"
encode case_ctrl, gen(cc) label(cc)
drop case_ctrl

The above will, to the extent possible, assign each patient with hxanemia == 1 (cases) a control (hxanemia = 0) having the same value of hxckd, and no patient will serve as a control to more than one case. The paired cases share a common value of the new variable pair_num. Because the numbers of hxckd patients in the cases and controls differ, not every case could be assigned a matched pair, and, similarly, some other potential controls were leftover as matches to nobody.

Comment

Json Ochs

Join Date: Apr 2022

Posts: 5
#5

21 Apr 2022, 07:48

Thank you so much! This is fantastic. I really appreciate you laying this all out
Comment

Announcement