Hi all. I found two datasets with the 10,000 most popular Brazilian male first names and the 10,000 most popular Brazilian female first names. Each of these has the names’ frequencies in the population and a variable called “rank” (1 for the most popular name, 2 for second most popular, etc.). The first names in each dataset appear only once, no repetitions. That can be useful for some very basic text analysis with Brazilian names.
Let’s say I have a list of candidates running for office in one dataset (with first and last names) and the list with the most popular Brazilian female first names in another dataset.
Can I issue a command in Stata asking the program to link “rank” in the using dataset every time the name appears on the variable "name" in the master dataset without isolating first name from last name(s)? Or must I create a variable that contains only first names first (for instance, first_female_name) and after that run
Example:
Master dataset has one observation with name: “CAMILA RODRIGUES”
What I want: a command that gives me the “rank” for “CAMILA” appearance.
If the question is not clear, please let me know. Thanks.
Let’s say I have a list of candidates running for office in one dataset (with first and last names) and the list with the most popular Brazilian female first names in another dataset.
Can I issue a command in Stata asking the program to link “rank” in the using dataset every time the name appears on the variable "name" in the master dataset without isolating first name from last name(s)? Or must I create a variable that contains only first names first (for instance, first_female_name) and after that run
Code:
merge m:1 first_female_name using using_dataset.dta, keepusing(rank)?
Master dataset has one observation with name: “CAMILA RODRIGUES”
What I want: a command that gives me the “rank” for “CAMILA” appearance.
If the question is not clear, please let me know. Thanks.
Comment