Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nationality and Name

    Hi!

    I have a dataset that has the names for around 17000 individuals, and I am wondering if there is any feasible way to infer the nationality of a person from his or her name. I know that it is possible to predict the race of a person from the name, but I am not sure what to do when it comes to nationality. It would be great if anyone has any prior experience or knowledge on how to proceed. I am afraid that I don't have any related information other than a person's name.

    Thanks a lot!

  • #2
    Well, let's try this in reverse. From your second name Zhao I would guess that you are Chinese. In doing that I am ignoring your first name because know that many Chinese people use different names, one given by their parents and one for mixing in Europe, North America, Australia, New Zealand etc. If that's right, then software has to ignore some of the information to make a good guess.

    I don't know what people would guess from my name alone. I happen to be British, but my guesses about what guesses could make sense would certainly include being a citizen of the US, Ireland, Australia, New Zealand, and quite possibly some other countries.

    Naturally, the last thing desirable or practicable is to go through 17000 individuals but in these ambiguous examples lie the nub of the problem. I dare say that some TLA (three-letter agency) draws upon a database but I don't know of Stata code to do this. I hope someone has a much better answer.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Well, let's try this in reverse. From your second name Zhao I would guess that you are Chinese. In doing that I am ignoring your first name because know that many Chinese people use different names, one given by their parents and one for mixing in Europe, North America, Australia, New Zealand etc. If that's right, then software has to ignore some of the information to make a good guess.

      I don't know what people would guess from my name alone. I happen to be British, but my guesses about what guesses could make sense would certainly include being a citizen of the US, Ireland, Australia, New Zealand, and quite possibly some other countries.

      Naturally, the last thing desirable or practicable is to go through 17000 individuals but in these ambiguous examples lie the nub of the problem. I dare say that some TLA (three-letter agency) draws upon a database but I don't know of Stata code to do this. I hope someone has a much better answer.
      Hi Nick,

      Thank you for your reply! Yes, you guessed correctly that I am Chinese. Also, I agree with the problem you brought up in your reply.

      In terms of my project, I am able to obtain a prediction of each person's race from his or her name. Then, if it turns out that inferring the exact nationality of a person from the name is a hard and messy task, it would also be very helpful to know whether, for example, a person with race "White" actually has an American name or European name, and a person with race "Asian" has an East Asian name or South Asian name. To my own understanding, this might be a somewhat easier thing to do than inferring the exact nationality.

      So a more concrete example. American last name vs. European last name: "Miller" vs. "Van Der Burg", etc. East Asian last name vs. South Asian last name: "Zhao" vs. "Bhargava", etc

      Maybe someone knows something about predicting American vs. European, East Asian vs. South Asian names, etc. If so, let me know. Thanks again!


      Last edited by Hugo Zhao; 03 Jun 2021, 05:17.

      Comment

      Working...
      X