Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting the dataset into groups of 799 observations based on a variable

    Hi all,

    I have a dataset of 250,000 observations and there is a variable called countrynum providing a numeric code for the country.

    I need to split the observations into groups of less than 800, because I then need to apply a command that only runs on less than 800 observations at a time. Whenever a country has less than 800 observations, I'm fine using countrynum as identifier: I temporarily keep if contrynum = i and run the command on that subset. However, certain countries have far more than 800 observations.

    I would like to create an identifier that assigns a unique value to each subset of less than 800 observations (be it a whole country or a partition of it).In that way, I can keep if identifier = i and run the command on each subset separately.

    For the whole-country part, I simply copy countrynum into a variable called identifier: gen identifier = 0, bysort countrynum: egen freq = count(countrynum), and replace identifier = countrynum if freq < 800 .

    For the parition part, I would like to split the observations of each country in groups of less than 800, and assign a unique value in "identifier" to each subset.

    Does anyone have ideas?

    Thank you very much.

  • #2
    Code:
    bys countrynum: gen identifier = floor(_n/800)

    Comment

    Working...
    X