Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drawing random sample for each observation where parameters vary for each observation

    Hi, I have a large dataset on course enrollment. Individual students take courses in different semesters. Observations are unique at the individual-semester-coursenum level. Individuals also have different graduation years ("cohort"). I would like to select, for each individual, a random sample of individuals in their cohort of a different size ("total") that is different for each individual. The best possible way I can think of is to loop through the individual observations and use the randomtag command, and create a unique identifer for each value of random tag (possibly the unique identifer of the student). For example, the following code works:

    preserve
    keep id cohort
    duplicates drop /* We now have one observation per individual */
    egen total = count(id), by (cohort)
    local N=_N
    set seed 1357
    g randomgroup = .
    sort id
    forvalues i = 1/`N' {
    global id = id[`i']
    global year = cohort[`i']
    global groupsize = total[`i']
    randomtag if cohort == $year, count($groupsize) g(selected$id)
    replace randomgroup = $id*selected$id if selected$id == 1
    }
    sort id
    save randomgroups.dta
    restore
    sort id
    merge id using randomgroups.dta

    I'm wondering if there is a faster way to do this, rather than looping over individual observations to generate random samples one at a time. Thank you for your suggestions.
Working...
X