Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a sex and age Matched Control (1:4) in a NSCH dataset

    Hello, I am using NSCH 2021 dataset and STATA 16.1
    I have been trying to create a sex and age matched control (No to disorder A) (1:4 ratio) for the cases (Yes to disorder A) in the dataset. I have searched Statalist and tried many approaches but could not figure out how to do it.
    For more details, I included the full code here.


    My code_1 is below:

    gen adhd_ts=0
    replace adhd_ts=1 if adhd==1|ts==1
    replace adhd_ts=2 if adhd==1&ts==1
    // I would like to create matched control using [adhd_ts==0] as a pool, and [adhd_ts==1] and [adhd_ts==2] as control (create two control groups for each case group)
    drop if adhd_ts==2 // trying to create a matched control for adhd_ts==1 cases first
    preserve
    keep if adhd_ts==0
    rename * *_control
    rename age_control age
    rename sex_control sex
    tempfile control
    save `control'
    restore
    keep if adhd_ts==1
    joinby age sex using `control'
    set seed 1234
    gen double shuffle = runiform()
    by hhid_case (shuffle), sort:keep if _n==1 //hhid is unique number for every sample
    drop shuffle


    -> by this, I can make 1:1 randomly sex/age matched groups but I think the result got mixed between case and control.

    My code_2 is below:
    gen adhd_ts=0
    replace adhd_ts=1 if adhd==1|ts==1
    replace adhd_ts=2 if adhd==1&ts==1
    // I would like to create matched control using [adhd_ts==0] as a pool, and [adhd_ts==1] and [adhd_ts==2] as control (create two control groups for each case group)
    drop if adhd_ts==2 // trying to create a matched control for adhd_ts==1 cases first
    gen ok=(adhd_ts==0)
    gen random=runiform()
    sort ok random
    gen insample=ok&(_N-_n)<13302 // 13302 is 4 times of the cases (adhd_ts==1)
    drop if insample==0&adhd_ts==0


    -> by this, I can make randomly selected control group with 1:4 ratio to the case groups, but there are not age/sex matched

    My code_3 is below:
    gen adhd_ts=0
    replace adhd_ts=1 if adhd==1|ts==1
    replace adhd_ts=2 if adhd==1&ts==1
    // I would like to create matched control using [adhd_ts==0] as a pool, and [adhd_ts==1] and [adhd_ts==2] as control (create two control groups for each case group)
    drop if adhd_ts==2 // trying to create a matched control for adhd_ts==1 cases first
    calipmatch, generate(newvar) casevar(adhd_ts) maxmatches(4) calipermatch(sex age) caliperwidth(1 1)


    -> by this, I thought I succeeded, but when I did t-test for age and chi-square for sex, there were significant difference between case group vs. control group. (Maybe due to the width? but I don't think I can set it as 0 0)

    My 4th try included kmatch, as below,
    kmatch em adhd_ts (sex age), gen
    but I don't think I applied it in the right way since the dataset didn't change anything except additional _KM_ variables.

    Please provide any advice or resources to help me to figure this out. Thank you in advance.


  • #2
    You don't show any example data. And I don't know what NSCH is, nor where I might find it. So I made up a toy data set that illustrates the approach and resembles those aspects of the data which you have described.

    Code:
    //    CREATE DEMONSTRATION DATA SET
    clear*
    set seed 1234
    set obs 3
    gen byte adhd_ts = _n-1
    expand 2 if adhd_ts > 0
    expand 2
    label define sex    0    "M"    1    "F"
    gen byte sex:sex = runiformint(0, 1)
    expand 5
    gen byte age_group = runiformint(1, 5)
    gen `c(obs_t)' id = _n
    
    ds age_group sex, not
    local vbles `r(varlist)'
    
    //    CREATE CONTROLS WITH ADHD_TS == 1
    preserve
    keep if adhd_ts == 1
    rename (`vbles') =_ctrl1
    tempfile controls1
    save `controls1'
    
    //    CREATE CONTROLS WITH ADHD_TS == 2
    restore, preserve
    keep if adhd_ts == 2
    rename(`vbles') =_ctrl2
    tempfile controls2
    save `controls2'
    
    //    ISOLATE THE CASES
    restore
    keep if adhd_ts == 0
    rename (`vbles') (=_case)
    tempfile cases
    save `cases'
    
    //    CREATE MATCHED PAIRS OF THE TWO TYPES OF CONTROLS
    use `controls1', clear
    joinby sex age_group using `controls2'
    tempfile controls12
    save `controls12'
    
    //    NOW MATCH EACH CASE WITH TWO CONTROLS12 PAIRS
    use `cases', clear
    joinby sex age_group using `controls12'
    gen double shuffle = runiform()
    by id_case (shuffle), sort: keep if _n <= 2
    drop shuffle
    
    gen `c(obs_t)' tuple = _n
    reshape long `vbles', i(tuple) j(group) string
    replace group = substr(group, 2, .)
    In the future, when asking for help with code, please use the -dataex- command and show example data. Although sometimes, as here, it is possible to give an answer that has a reasonable probability of being correct, this is usually not the case. Moreover, such answers are necessarily based on experience-based guesses or intuitions about the nature of your data. When those guesses are wrong, both you and the person trying to help you have wasted their time as you end up with useless code. To avoid this, a -dataex- based example provides all of the information needed to develop and test a solution.

    If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    An alternative to showing example data that can be helpful is to provide a link to a publicly available internet site that contains the data as a Stata data set.

    It is unwise to use abbreviations like NSCH here. While NSCH may be very familiar to everyone in your circle, this is an international multi-disciplinary forum. So jargon and abbreviations should be restricted to those that any university-educated person, in any field, anywhere in the world, would recognize. For anything else, either omit mention of it if it isn't central to understanding the problem (as here), or spell out the abbreviation and explain what it is.

    Added: It just dawns on me that I do not know when `c(obs_t)', which I use in a few places in the code, was introduced to Stata. If your version 16.1 doesn't recognize it, just replace it with long (unless the number of observations in your data set exceeds 1 billion, in which case, replace it with double.)
    Last edited by Clyde Schechter; 14 Mar 2024, 19:09.

    Comment


    • #3
      Thank you so much, this is very helpful!! I'll be more careful so that I can make clearer post, thank you for your kind advice.

      Comment

      Working...
      X