Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Extracting Specific String Elements of a String Variable

    Hello All,

    I would like to extract certain specified string components of a string variable so that I can use the tab var, gen(x1) command to create a dummy variable across all components.

    The -dataex- is as follows:

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str12 county str64 contest
    "LENOIR" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
    "LENOIR" "REGISTER OF DEEDS"
    "LENOIR" "SECRETARY OF STATE"
    "LENOIR" "SOIL AND WATER CONSERVATION DISTRICT"
    "LENOIR" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
    "LENOIR" "SUPERIOR COURT JUDGE DISTRICT"
    "LENOIR" "SUPREME COURT ASSOCIATE JUSTICE"
    "LENOIR" "TREASURER"
    "LENOIR" "US HOUSE OF REPRESENTATIVES DISTRICT"
    "LENOIR" "US SENATE"
    "LINCOLN" "ATTORNEY GENERAL"
    "LINCOLN" "AUDITOR"
    "LINCOLN" "SCHOOL BOARD"
    "LINCOLN" "SCHOOL BOARD IRONTON DISTRICT"
    "LINCOLN" "SCHOOL BOARD LINCOLNTON DISTRICT"
    "LINCOLN" "SCHOOL BOARD NORTH BROOK DISTRICT"
    "LINCOLN" "COMMISSIONER OF AGRICULTURE"
    "LINCOLN" "COMMISSIONER OF INSURANCE"
    "LINCOLN" "COMMISSIONER OF LABOR"
    "LINCOLN" "COUNTY COMMISSIONER"
    "LINCOLN" "COURT OF APPEALS JUDGE"
    "LINCOLN" "DISTRICT COURT JUDGE DISTRICT"
    "LINCOLN" "GOVERNOR"
    "LINCOLN" "LIEUTENANT GOVERNOR"
    "LINCOLN" "NC HOUSE OF REPRESENTATIVES DISTRICT"
    "LINCOLN" "NC STATE SENATE DISTRICT"
    "LINCOLN" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
    "LINCOLN" "SECRETARY OF STATE"
    "LINCOLN" "SOIL AND WATER CONSERVATION DISTRICT"
    "LINCOLN" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
    "LINCOLN" "SUPREME COURT ASSOCIATE JUSTICE"
    "LINCOLN" "TREASURER"
    "LINCOLN" "US HOUSE OF REPRESENTATIVES DISTRICT"
    "LINCOLN" "US SENATE"
    "MACON" "ATTORNEY GENERAL"
    "MACON" "AUDITOR"
    "MACON" "COMMISSIONER OF AGRICULTURE"
    "MACON" "COMMISSIONER OF INSURANCE"
    "MACON" "COMMISSIONER OF LABOR"
    "MACON" "COUNTY COMMISSIONER II"
    "MACON" "COUNTY COMMISSIONER III"
    "MACON" "COURT OF APPEALS JUDGE"
    "MACON" "DISTRICT COURT JUDGE DISTRICT"
    "MACON" "GOVERNOR"
    "MACON" "LIEUTENANT GOVERNOR"
    "MACON" "NC HOUSE OF REPRESENTATIVES DISTRICT"
    "MACON" "NC STATE SENATE DISTRICT"
    "MACON" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
    "MACON" "SCHOOL BOARD DISTRICT I"
    "MACON" "SCHOOL BOARDI"
    "MACON" "SCHOOL BOARD DISTRICT IV"
    "MACON" "SECRETARY OF STATE"
    "MACON" "SOIL AND WATER CONSERVATION DISTRICT"
    "MACON" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
    "MACON" "SUPREME COURT ASSOCIATE JUSTICE"
    "MACON" "TREASURER"
    "MACON" "US HOUSE OF REPRESENTATIVES DISTRICT"
    "MACON" "US SENATE"
    "MADISON" "ATTORNEY GENERAL"
    "MADISON" "AUDITOR"
    "MADISON" "COMMISSIONER OF AGRICULTURE"
    "MADISON" "COMMISSIONER OF INSURANCE"
    "MADISON" "COMMISSIONER OF LABOR"
    "MADISON" "COUNTY COMMISSIONER"
    "MADISON" "COURT OF APPEALS JUDGE"
    "MADISON" "DISTRICT COURT JUDGE DISTRICT"
    "MADISON" "GOVERNOR"
    "MADISON" "LIEUTENANT GOVERNOR"
    "MADISON" "NC HOUSE OF REPRESENTATIVES DISTRICT"
    "MADISON" "NC STATE SENATE DISTRICT"
    "MADISON" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
    "MADISON" "REGISTER OF DEEDS"
    "MADISON" "SECRETARY OF STATE"
    "MADISON" "SOIL AND WATER CONSERVATION DISTRICT"
    "MADISON" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
    "MADISON" "SUPREME COURT ASSOCIATE JUSTICE"
    "MADISON" "TREASURER"
    "MADISON" "US HOUSE OF REPRESENTATIVES DISTRICT"
    "MADISON" "US SENATE"
    "MARTIN" "ATTORNEY GENERAL"
    "MARTIN" "AUDITOR"
    "MARTIN" "BOARD OF COMMISSIONERS EASTERN DISTRICT"
    "MARTIN" "SCHOOL BOARD"
    "MARTIN" "COMMISSIONER OF AGRICULTURE"
    "MARTIN" "COMMISSIONER OF INSURANCE"
    "MARTIN" "COMMISSIONER OF LABOR"
    "MARTIN" "COURT OF APPEALS JUDGE"
    "MARTIN" "DISTRICT COURT JUDGE DISTRICT"
    "MARTIN" "GOVERNOR"
    "MARTIN" "LIEUTENANT GOVERNOR"
    "MARTIN" "NC HOUSE OF REPRESENTATIVES DISTRICT"
    "MARTIN" "NC STATE SENATE DISTRICT"
    "MARTIN" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
    "MARTIN" "REGISTER OF DEEDS"
    "MARTIN" "SECRETARY OF STATE"
    "MARTIN" "SOIL AND WATER CONSERVATION DISTRICT"
    "MARTIN" "SUPERINTENDENT OF PUBLIC INSTRUCTION"
    "MARTIN" "SUPREME COURT ASSOCIATE JUSTICE"
    "MARTIN" "TREASURER"
    "MARTIN" "US HOUSE OF REPRESENTATIVES DISTRICT"
    "MARTIN" "US SENATE"
    "MCDOWELL" "ATTORNEY GENERAL"
    "MCDOWELL" "AUDITOR"
    "MCDOWELL" "SCHOOL BOARD MARION DISTRICT"
    "MCDOWELL" "SCHOOL BOARD NORTH COVE DISTRICT"
    "MCDOWELL" "SCHOOL BOARD OLD FORT DISTRICT"
    "MCDOWELL" "COMMISSIONER OF AGRICULTURE"
    "MCDOWELL" "COMMISSIONER OF INSURANCE"
    "MCDOWELL" "COMMISSIONER OF LABOR"
    "MCDOWELL" "COUNTY COMMISSIONER"
    "MCDOWELL" "COURT OF APPEALS JUDGE"
    "MCDOWELL" "GOVERNOR"
    "MCDOWELL" "LIEUTENANT GOVERNOR"
    "MCDOWELL" "NC HOUSE OF REPRESENTATIVES DISTRICT"
    "MCDOWELL" "NC STATE SENATE DISTRICT"
    "MCDOWELL" "PRESIDENT AND VICE PRESIDENT OF THE UNITED STATES"
    "MCDOWELL" "REGISTER OF DEEDS"

    The issue is that I would like to remove the extraneous part of the contest variable that makes them incomparable across counties (i.e. things like "Ironton District," "Marion District" and " I", " II", " III", etc). I will eventually be using this variable list with: tab contest, gen (contest_) in order to construct a dummy that indicates whether a particular type of election took place in each county.

    What I have been doing so far is using the subinstr command that I read about in a different help file to individually remove each of the extraneous elements of the county name, i.e. "replace contest = subinstr(contest, " IRONTON DISTRICT", "", .)" repeatedly, but there are thousands of lines of code that look like this, across multiple years, and there is no consistency to the extra components that are found before or after the desired component of the variable (and yes, sometimes the part I am trying to get rid of is before, rather than after, even though no examples of this exist within this data sample) that I have to clean, so I am realizing it will take me a very, very long time to sort through the data in this way. In this example, most of the issues happen after "SCHOOL BOARD" and "COUNTY COMMISSIONER" type observations, but this is not always the case, either.

    Is there any way to tell Stata that I would like to keep any certain specified elements of a string and to scrap the rest, instead of going through one by one and eliminating the extraneous components? I have also tried to use the strkeep function, but I'm not sure how to apply it to this particular situation, if it would apply at all.

    Thanks very much!

    (This is my first time posting, and I have tried to follow all the rules, but apologies if I have left something out -- happy to edit or provide further clarification as would be helpful.)

  • #2
    I would like to keep any certain specified elements of a string and to scrap the rest,
    Code:
    gen contest2 = contest
    
    local  list `" "SCHOOL BOARD" "COUNTY COMMISSIONER" "'
    
    foreach el of local list {
    
        replace contest2 = "`el'" if strpos(contest,"`el'") == 1
    }
    Code:
    . list contest* if contest != contest2
    
         +---------------------------------------------------------+
         |                           contest              contest2 |
         |---------------------------------------------------------|
     14. |     SCHOOL BOARD IRONTON DISTRICT          SCHOOL BOARD |
     15. |  SCHOOL BOARD LINCOLNTON DISTRICT          SCHOOL BOARD |
     16. | SCHOOL BOARD NORTH BROOK DISTRICT          SCHOOL BOARD |
     40. |            COUNTY COMMISSIONER II   COUNTY COMMISSIONER |
     41. |           COUNTY COMMISSIONER III   COUNTY COMMISSIONER |
         |---------------------------------------------------------|
     49. |           SCHOOL BOARD DISTRICT I          SCHOOL BOARD |
     50. |                     SCHOOL BOARDI          SCHOOL BOARD |
     51. |          SCHOOL BOARD DISTRICT IV          SCHOOL BOARD |
    104. |      SCHOOL BOARD MARION DISTRICT          SCHOOL BOARD |
    105. |  SCHOOL BOARD NORTH COVE DISTRICT          SCHOOL BOARD |
         |---------------------------------------------------------|
    106. |    SCHOOL BOARD OLD FORT DISTRICT          SCHOOL BOARD |
         +---------------------------------------------------------+
    Last edited by Bjarte Aagnes; 22 Apr 2019, 03:10.

    Comment


    • #3
      Hello,

      Thank you very much for your message!

      This worked on the particular dataset I shared with you, but I am trying to apply this to new years and I am getting an error message from Stata.

      The -dataex- for the relevant year:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str12 county str76 contest
      "ALAMANCE"  "ALAMANCE-BURLINGTON BOARD OF EDUCATION"                    
      "ALAMANCE"  "BOARD OF COMMISSIONERS"                                    
      "ALAMANCE"  "BOARD OF COMMISSIONERS - Unexpired term ending 2012"       
      "ALAMANCE"  "CLERK OF SUPERIOR COURT"                                   
      "ALAMANCE"  "CONSTITUTIONAL AMENDMENT"                                  
      "ALAMANCE"  "COUNTY SALES AND USE TAX"                                  
      "ALAMANCE"  "COURT OF APPEALS JUDGE - Calabria Seat"                    
      "ALAMANCE"  "COURT OF APPEALS JUDGE - Elmore Seat"                      
      "ALAMANCE"  "COURT OF APPEALS JUDGE - Geer Seat"                        
      "ALAMANCE"  "COURT OF APPEALS JUDGE - IRV Contest"                      
      "ALAMANCE"  "COURT OF APPEALS JUDGE - Steelman Seat"                    
      "ALAMANCE"  "DISTRICT ATTORNEY DISTRICT 15A"                            
      "ALAMANCE"  "DISTRICT COURT JUDGE DISTRICT 15A - Allen Seat"            
      "ALAMANCE"  "NC HOUSE OF REPRESENTATIVES DISTRICT 63"                   
      "ALAMANCE"  "NC HOUSE OF REPRESENTATIVES DISTRICT 64"                   
      "ALAMANCE"  "NC STATE SENATE DISTRICT 24"                               
      "ALAMANCE"  "SHERIFF"                                                   
      "ALAMANCE"  "SOIL AND WATER CONSERVATION DISTRICT SUPERVISOR"           
      "ALAMANCE"  "SUPERIOR COURT JUDGE DISTRICT 15A"                         
      "ALAMANCE"  "SUPREME COURT ASSOCIATE JUSTICE - Brady Seat"              
      "ALAMANCE"  "US HOUSE OF REPRESENTATIVES DISTRICT 13"                   
      "ALAMANCE"  "US HOUSE OF REPRESENTATIVES DISTRICT 6"                    
      "ALAMANCE"  "US SENATE"                                                 
      "ALEXANDER" "BOARD OF COMMISSIONERS"                                    
      "ALEXANDER" "BOARD OF EDUCATION DISTRICT 1"                             
      "ALEXANDER" "BOARD OF EDUCATION DISTRICT 2"                             
      "ALEXANDER" "BOARD OF EDUCATION DISTRICT 4"                             
      "ALEXANDER" "CLERK OF SUPERIOR COURT"                                   
      "ALEXANDER" "CONSTITUTIONAL AMENDMENT"                                  
      "ALEXANDER" "COURT OF APPEALS JUDGE - Calabria Seat"                    
      "ALEXANDER" "COURT OF APPEALS JUDGE - Elmore Seat"                      
      "ALEXANDER" "COURT OF APPEALS JUDGE - Geer Seat"                        
      "ALEXANDER" "COURT OF APPEALS JUDGE - IRV Contest"                      
      "ALEXANDER" "COURT OF APPEALS JUDGE - Steelman Seat"                    
      "ALEXANDER" "DISTRICT COURT JUDGE DISTRICT 22A - Church Seat"           
      "ALEXANDER" "NC HOUSE OF REPRESENTATIVES DISTRICT 88"                   
      "ALEXANDER" "NC STATE SENATE DISTRICT 45"                               
      "ALEXANDER" "REGISTER OF DEEDS"                                         
      "ALEXANDER" "SHERIFF"                                                   
      "ALEXANDER" "SOIL AND WATER CONSERVATION DISTRICT SUPERVISOR"           
      "ALEXANDER" "SUPERIOR COURT JUDGE DISTRICT 22A"                         
      "ALEXANDER" "SUPREME COURT ASSOCIATE JUSTICE - Brady Seat"              
      "ALEXANDER" "US HOUSE OF REPRESENTATIVES DISTRICT 5"                    
      "ALEXANDER" "US SENATE"                                                 
      "ALLEGHANY" "BOARD OF COMMISSIONERS"                                    
      "ALLEGHANY" "BOARD OF EDUCATION"                                        
      "ALLEGHANY" "CLERK OF SUPERIOR COURT"                                   
      "ALLEGHANY" "CONSTITUTIONAL AMENDMENT"                                  
      "ALLEGHANY" "COUNTY SALES AND USE TAX"                                  
      "ALLEGHANY" "COURT OF APPEALS JUDGE - Calabria Seat"                    
      "ALLEGHANY" "COURT OF APPEALS JUDGE - Elmore Seat"                      
      "ALLEGHANY" "COURT OF APPEALS JUDGE - Geer Seat"                        
      "ALLEGHANY" "COURT OF APPEALS JUDGE - IRV Contest"                      
      "ALLEGHANY" "COURT OF APPEALS JUDGE - Steelman Seat"                    
      "ALLEGHANY" "DISTRICT ATTORNEY DISTRICT 23"                             
      "ALLEGHANY" "DISTRICT COURT JUDGE DISTRICT 23 - Byrd Seat"              
      "ALLEGHANY" "DISTRICT COURT JUDGE DISTRICT 23 - Duncan Seat"            
      "ALLEGHANY" "DISTRICT COURT JUDGE DISTRICT 23 - McLean Seat"            
      "ALLEGHANY" "NC HOUSE OF REPRESENTATIVES DISTRICT 90"                   
      "ALLEGHANY" "NC STATE SENATE DISTRICT 30"                               
      "ALLEGHANY" "SHERIFF"                                                   
      "ALLEGHANY" "SOIL AND WATER CONSERVATION DISTRICT SUPERVISOR"           
      "ALLEGHANY" "SUPREME COURT ASSOCIATE JUSTICE - Brady Seat"              
      "ALLEGHANY" "US HOUSE OF REPRESENTATIVES DISTRICT 5"                    
      "ALLEGHANY" "US SENATE"                                                 
      "ANSON"     "BOARD OF COMMISSIONERS DISTRICT 2"                         
      "ANSON"     "BOARD OF COMMISSIONERS DISTRICT 4"                         
      "ANSON"     "BOARD OF COMMISSIONERS DISTRICT 5"                         
      "ANSON"     "BOARD OF EDUCATION AT-LARGE"                               
      "ANSON"     "BOARD OF EDUCATION DISTRICT 2"                             
      "ANSON"     "BOARD OF EDUCATION DISTRICT 4"                             
      "ANSON"     "BOARD OF EDUCATION DISTRICT 5"                             
      "ANSON"     "BOARD OF EDUCATION DISTRICT 7 - Unexpired term ending 2012"
      "ANSON"     "CLERK OF SUPERIOR COURT"                                   
      "ANSON"     "CONSTITUTIONAL AMENDMENT"                                  
      "ANSON"     "COURT OF APPEALS JUDGE - Calabria Seat"                    
      "ANSON"     "COURT OF APPEALS JUDGE - Elmore Seat"                      
      "ANSON"     "COURT OF APPEALS JUDGE - Geer Seat"                        
      "ANSON"     "COURT OF APPEALS JUDGE - IRV Contest"                      
      "ANSON"     "COURT OF APPEALS JUDGE - Steelman Seat"                    
      "ANSON"     "DISTRICT ATTORNEY DISTRICT 20A"                            
      "ANSON"     "DISTRICT COURT JUDGE DISTRICT 20A - Brewer Seat"           
      "ANSON"     "DISTRICT COURT JUDGE DISTRICT 20A - Tucker Seat"           
      "ANSON"     "NC HOUSE OF REPRESENTATIVES DISTRICT 69"                   
      "ANSON"     "NC STATE SENATE DISTRICT 25"                               
      "ANSON"     "SHERIFF"                                                   
      "ANSON"     "SOIL AND WATER CONSERVATION DISTRICT SUPERVISOR"           
      "ANSON"     "SUPREME COURT ASSOCIATE JUSTICE - Brady Seat"              
      "ANSON"     "US HOUSE OF REPRESENTATIVES DISTRICT 8"                    
      "ANSON"     "US SENATE"                                                 
      "ASHE"      "BOARD OF COMMISSIONERS"                                    
      "ASHE"      "BOARD OF EDUCATION"                                        
      "ASHE"      "CLERK OF SUPERIOR COURT"                                   
      "ASHE"      "CONSTITUTIONAL AMENDMENT"                                  
      "ASHE"      "COURT OF APPEALS JUDGE - Calabria Seat"                    
      "ASHE"      "COURT OF APPEALS JUDGE - Elmore Seat"                      
      "ASHE"      "COURT OF APPEALS JUDGE - Geer Seat"                        
      "ASHE"      "COURT OF APPEALS JUDGE - IRV Contest"                      
      "ASHE"      "COURT OF APPEALS JUDGE - Steelman Seat"                    
      "ASHE"      "DISTRICT ATTORNEY DISTRICT 23"                             
      end

      How I modified your code for this particular year:


      gen contest2 = contest
      local list `" "BOARD OF EDUCATION" "CLERK OF SUPERIOR COURT" "CONSTITUTIONAL AMENDMENT" "COUNTY SALES AND USE TAX" "COURT OF APPEALS JUDGE" "BOARD OF COMMISSIONERS" "CITY COUNCIL" "CITY COUNCIL" "CONSTITUTIONAL AMENDMENT" "CORONER" "DISTRICT ATTORNEY" "DISTRICT COURT JUDGE" "SANITARY DISTRICT" "NC HOUSE" "NC STATE SENATE" "REGISTER OF DEEDS" "SOIL AND WATER CONSERVATION" "SUPERIOR COURT JUDGE" "SUPREME COURT ASSOCIATE JUSTICE" "CHARTER AMENDMENT" "US HOUSE" "US SENATE" "BOND" "'
      foreach el of local list {
      replace contest2 "`el'" if strpos(contest, "`el'") == 1
      }

      Stata's resulting error message:
      "BOARD OF EDUCATION invalid name
      r(198);


      Do you know why this code might not work for this dataset and whether there is anything I can do?

      Thank you again so much!

      Best,

      Alyssa

      Comment


      • #4
        Return the missing equal sign to your code.
        Code:
        replace contest2 = "`el'" if strpos(contest, "`el'") == 1

        Comment


        • #5
          Sorry, yes, it works now . Thank you!!

          Comment

          Working...
          X