Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Matching observations in two columns on stata

    Hello, I am having a problem I have two columns in my data set; countyname and countynamegini. I want to match all the observations in countyname with those in countynamegini for example

    countyname countynamegini
    Barbour Al Barbour Al
    Bibb Al Etowah Al
    Etowah Al

    I want to match Etowah in countyname with countynamegini and delete the observation Bibb Al because it doesn't appear in countynamegini

    I hope this makes sense,thank you

  • #2
    This question is really unclear and would definitely benefit from some example data. Check out the FAQ for information on how to provide sample data and how to ask questions in a way that helps us help you.

    Do you have a single data set where the variable countyname and countynamegini sometimes don't agree? You can easily drop observations where the two variables don't agree. The code for that would be.
    Code:
    drop if countyname != countynamegini
    But it sounds like you're trying to do something more complicated, in which case you're going to need to provide us with more information about what you want to happen to other observations in the data when you "match" the variables.

    Comment


    • #3
      I hope this makes more sense then, I have the following variables countyname and countynamegini, I want to match each observation in countyname with that in countynamegini. For example I want to keep the observations Baldwin AL and Calhon in the variable countyname and drop the other observations that can be seen in the sample code below.

      [CODE]
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str29 countyname str42 countynamegini
      "UNITED STATES" "Baldwin County, Alabama"
      "ALABAMA" "Calhoun County, Alabama"
      "Autauga, AL" "Cullman County, Alabama"
      "Baldwin, AL" "DeKalb County, Alabama"
      "Barbour, AL" "Elmore County, Alabama"
      "Bibb, AL" "Etowah County, Alabama"
      "Blount, AL" "Houston County, Alabama"
      "Bullock, AL" "Jefferson County, Alabama"
      "Butler, AL" "Lauderdale County, Alabama"
      "Calhoun, AL" "Lee County, Alabama"
      Last edited by Luis Mijares; 21 Nov 2019, 16:39.

      Comment


      • #4
        Ok. This doesn't really clarify much. If this is really what your data looks like and these variables are appearing in the same data set linked like this then you probably have a data set that was created by a bad merge. In that case your best bet may be to ask the data creator to fix it (or if you are the data creator, provide examples of your TWO original data sets and ask for help merging). .Also, check to see if your data has FIPS codes. Matching on string county names can be done but my experience is that differences in how things are recorded across data sources can make it hard.

        Comment


        • #5
          I don't have access to the original creator of the data set.

          Comment


          • #6
            On the surface, I would not trust this data at all but it's hard to say for sure what's going on since all I have to go on is two variables.

            Just to be absolutely sure, these two variables really appear in the same data set as you've presented them, right? These aren't variables from two different data sets that you've squished together in some way for the sake of example?

            Perhaps if you show us example data with the other variables in the data and describe very clearly what you want to have happen to the values of those other variables when you restructure your data someone will be able to help you.

            Honestly, though, your time might be better spent finding or creating more reliable data. Unless you are absolutely sure you understand how the other variables in the data are related when these two variables are misaligned like this I would not be inclined to trust any analysis performed with this data. Remember, garbage in garbage out.

            Comment

            Working...
            X