How to create a sex and age Matched Control (1:4) in a NSCH dataset

Yuji Choi

Join Date: Mar 2024

Posts: 2
#1

How to create a sex and age Matched Control (1:4) in a NSCH dataset

14 Mar 2024, 16:19

Hello, I am using NSCH 2021 dataset and STATA 16.1
I have been trying to create a sex and age matched control (No to disorder A) (1:4 ratio) for the cases (Yes to disorder A) in the dataset. I have searched Statalist and tried many approaches but could not figure out how to do it.
For more details, I included the full code here.

My code_1 is below:

gen adhd_ts=0
replace adhd_ts=1 if adhd==1|ts==1
replace adhd_ts=2 if adhd==1&ts==1
// I would like to create matched control using [adhd_ts==0] as a pool, and [adhd_ts==1] and [adhd_ts==2] as control (create two control groups for each case group)
drop if adhd_ts==2 // trying to create a matched control for adhd_ts==1 cases first
preserve
keep if adhd_ts==0
rename * *_control
rename age_control age
rename sex_control sex
tempfile control
save `control'
restore
keep if adhd_ts==1
joinby age sex using `control'
set seed 1234
gen double shuffle = runiform()
by hhid_case (shuffle), sort:keep if _n==1 //hhid is unique number for every sample
drop shuffle

-> by this, I can make 1:1 randomly sex/age matched groups but I think the result got mixed between case and control.

My code_2 is below:
gen adhd_ts=0
replace adhd_ts=1 if adhd==1|ts==1
replace adhd_ts=2 if adhd==1&ts==1
// I would like to create matched control using [adhd_ts==0] as a pool, and [adhd_ts==1] and [adhd_ts==2] as control (create two control groups for each case group)
drop if adhd_ts==2 // trying to create a matched control for adhd_ts==1 cases first
gen ok=(adhd_ts==0)
gen random=runiform()
sort ok random
gen insample=ok&(_N-_n)<13302 // 13302 is 4 times of the cases (adhd_ts==1)
drop if insample==0&adhd_ts==0

-> by this, I can make randomly selected control group with 1:4 ratio to the case groups, but there are not age/sex matched

My code_3 is below:
gen adhd_ts=0
replace adhd_ts=1 if adhd==1|ts==1
replace adhd_ts=2 if adhd==1&ts==1
// I would like to create matched control using [adhd_ts==0] as a pool, and [adhd_ts==1] and [adhd_ts==2] as control (create two control groups for each case group)
drop if adhd_ts==2 // trying to create a matched control for adhd_ts==1 cases first
calipmatch, generate(newvar) casevar(adhd_ts) maxmatches(4) calipermatch(sex age) caliperwidth(1 1)

-> by this, I thought I succeeded, but when I did t-test for age and chi-square for sex, there were significant difference between case group vs. control group. (Maybe due to the width? but I don't think I can set it as 0 0)

My 4th try included kmatch, as below,
kmatch em adhd_ts (sex age), gen
but I don't think I applied it in the right way since the dataset didn't change anything except additional _KM_ variables.

Please provide any advice or resources to help me to figure this out. Thank you in advance.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

14 Mar 2024, 18:06

You don't show any example data. And I don't know what NSCH is, nor where I might find it. So I made up a toy data set that illustrates the approach and resembles those aspects of the data which you have described.

Code:

// CREATE DEMONSTRATION DATA SET clear* set seed 1234 set obs 3 gen byte adhd_ts = _n-1 expand 2 if adhd_ts > 0 expand 2 label define sex 0 "M" 1 "F" gen byte sex:sex = runiformint(0, 1) expand 5 gen byte age_group = runiformint(1, 5) gen `c(obs_t)' id = _n ds age_group sex, not local vbles `r(varlist)' // CREATE CONTROLS WITH ADHD_TS == 1 preserve keep if adhd_ts == 1 rename (`vbles') =_ctrl1 tempfile controls1 save `controls1' // CREATE CONTROLS WITH ADHD_TS == 2 restore, preserve keep if adhd_ts == 2 rename(`vbles') =_ctrl2 tempfile controls2 save `controls2' // ISOLATE THE CASES restore keep if adhd_ts == 0 rename (`vbles') (=_case) tempfile cases save `cases' // CREATE MATCHED PAIRS OF THE TWO TYPES OF CONTROLS use `controls1', clear joinby sex age_group using `controls2' tempfile controls12 save `controls12' // NOW MATCH EACH CASE WITH TWO CONTROLS12 PAIRS use `cases', clear joinby sex age_group using `controls12' gen double shuffle = runiform() by id_case (shuffle), sort: keep if _n <= 2 drop shuffle gen `c(obs_t)' tuple = _n reshape long `vbles', i(tuple) j(group) string replace group = substr(group, 2, .)

In the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

An alternative to showing example data that can be helpful is to provide a link to a publicly available internet site that contains the data as a Stata data set.

It is unwise to use abbreviations like NSCH here. While NSCH may be very familiar to everyone in your circle, this is an international multi-disciplinary forum. So jargon and abbreviations should be restricted to those that any university-educated person, in any field, anywhere in the world, would recognize. For anything else, either omit mention of it if it isn't central to understanding the problem (as here), or spell out the abbreviation and explain what it is.

Added: It just dawns on me that I do not know when `c(obs_t)', which I use in a few places in the code, was introduced to Stata. If your version 16.1 doesn't recognize it, just replace it with long (unless the number of observations in your data set exceeds 1 billion, in which case, replace it with double.)

Last edited by Clyde Schechter; 14 Mar 2024, 18:09.
Comment
Yuji Choi

Join Date: Mar 2024

Posts: 2
#3

01 Apr 2024, 13:17

Thank you so much, this is very helpful!! I'll be more careful so that I can make clearer post, thank you for your kind advice.
Comment

Announcement

How to create a sex and age Matched Control (1:4) in a NSCH dataset

Comment

Comment