Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • 1:N matching on age and gender

    Hi !
    I use Stata 11.2.
    I have a large dataset of observations : 250 are cases and 20,000 are controls. For all these observations, I get age and gender. Moreover, there is a variable named "case'" that identifies cases and controls. I would like to match cases and controls on age and gender with a 4 controls for 1 case ratio.

    ccmatch does not allow 1:N matching
    nnmatch needs a "treatment" variable
    vmatch avoids me to define the number of controls (actually, it defines a variable number of control for each case)
    to my knowledge, propensity score is not indicated here. Moreover, I would like to perform an exact matching.

    Do you have any idea. I'm a beginner in Stata programming but there may be a quite simple solution with my own do-file ?
    Thanks in advance for your help
    Best
    Guillaume

  • #2
    I don't see why you couldn't do a 1:n match with the regular merge command, then delete extra matches.

    Comment


    • #3
      Ben,

      It's unlikely that there is only one case for each combination of age and gender. So 1:n merge won't fly. And n:n merge will not give the kind of pairing desired here. I think he will have to do something like a -joinby age gender- between cases and controls. Unfortunately, just keeping four matches on each case won't be right, because the same controls will probably match with multiple cases. So to keep any given control from matching more than one case will require, I think, looping over cases, keeping four matches, and then deleting from the data set any observations from other cases that match to any of those same cases. And, by the way, hoping that there really are four matches available for every case when the dust settles.

      Comment


      • #4
        Clyde -- good points. I was sloppy with my thinking when I first glanced at it. What about something like:
        Code:
        *============MAKE FAKE DATA
        clear
        set seed 1971
        set obs 20250
        gen id=_n
        gen age = int(uniform()*75)
        gen gender=round(uniform())
        gen case=1 if _n<251
        
        *=======STASH AWAY CASES, THEN GET CONTROLS
        *=======NEED TO RENAME SUBSTANTIVE VARIABLES AS WELL
        preserve
        keep if case==1
        rename id case_id
        save temp_cases, replace
        restore
        keep if case!=1
        rename id control_id
        drop case
        
        *===sort by random variable in case there were ordering effects
        gen trash=uniform()
        sort trash
        drop trash
        
        *=====NOW MERGE THEM
        joinby age gender using temp_cases
        
        *======GETTING RID OF DUPLICATE MATCHES
        bysort control_id: keep if _n==1
        
        *===========KEEPING ONLY FIRST FOUR
        bysort case_id: keep if _n<5

        Comment


        • #5
          does the OP want matching without replacement as implied by Clyde and Ben? or is matching with replacement more appropriate in OP's situation?

          Comment


          • #6
            Hi Rich, Ben and Clyde
            Thanks a lot, your do-file runs perfectly !

            Comment

            Working...
            X