Hi all,
I have a dataset of 250,000 observations and there is a variable called countrynum providing a numeric code for the country.
I need to split the observations into groups of less than 800, because I then need to apply a command that only runs on less than 800 observations at a time. Whenever a country has less than 800 observations, I'm fine using countrynum as identifier: I temporarily keep if contrynum = i and run the command on that subset. However, certain countries have far more than 800 observations.
I would like to create an identifier that assigns a unique value to each subset of less than 800 observations (be it a whole country or a partition of it).In that way, I can keep if identifier = i and run the command on each subset separately.
For the whole-country part, I simply copy countrynum into a variable called identifier: gen identifier = 0, bysort countrynum: egen freq = count(countrynum), and replace identifier = countrynum if freq < 800 .
For the parition part, I would like to split the observations of each country in groups of less than 800, and assign a unique value in "identifier" to each subset.
Does anyone have ideas?
Thank you very much.
I have a dataset of 250,000 observations and there is a variable called countrynum providing a numeric code for the country.
I need to split the observations into groups of less than 800, because I then need to apply a command that only runs on less than 800 observations at a time. Whenever a country has less than 800 observations, I'm fine using countrynum as identifier: I temporarily keep if contrynum = i and run the command on that subset. However, certain countries have far more than 800 observations.
I would like to create an identifier that assigns a unique value to each subset of less than 800 observations (be it a whole country or a partition of it).In that way, I can keep if identifier = i and run the command on each subset separately.
For the whole-country part, I simply copy countrynum into a variable called identifier: gen identifier = 0, bysort countrynum: egen freq = count(countrynum), and replace identifier = countrynum if freq < 800 .
For the parition part, I would like to split the observations of each country in groups of less than 800, and assign a unique value in "identifier" to each subset.
Does anyone have ideas?
Thank you very much.
Comment