Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing string variable using "split"

    Hello,
    I would like to separate a string variable.
    So far, the code I have been using is:

    Code:
    split geoname, parse(,) generate(newgeo)
    The variable in question - geoname - means something like "county name, state abbreviation." I would like to have one column with the county and one column with the state abbreviation. In some cases, however, there is a second county/independent city specified. How do I specify the parsing location so that I still end up with one column that contains just the state abbreviation and one that contains the full county/indep. city? I'd also like to get rid of the * where it exists, but that's just a secondary concern

    Thank you!


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str48 geoname str43 newgeo1 str30 newgeo2 str4 newgeo3
    "Campbell + Lynchburg, VA*"                     "Campbell + Lynchburg"          " VA*"                           ""    
    "Campbell + Lynchburg, VA*"                     "Campbell + Lynchburg"          " VA*"                           ""    
    "Campbell + Lynchburg, VA*"                     "Campbell + Lynchburg"          " VA*"                           ""    
    "Carroll + Galax, VA*"                          "Carroll + Galax"               " VA*"                           ""    
    "Carroll + Galax, VA*"                          "Carroll + Galax"               " VA*"                           ""    
    "Carroll + Galax, VA*"                          "Carroll + Galax"               " VA*"                           ""    
    "Dinwiddie, Colonial Heights + Petersburg, VA*" "Dinwiddie"                     " Colonial Heights + Petersburg" " VA*"
    "Dinwiddie, Colonial Heights + Petersburg, VA*" "Dinwiddie"                     " Colonial Heights + Petersburg" " VA*"
    "Dinwiddie, Colonial Heights + Petersburg, VA*" "Dinwiddie"                     " Colonial Heights + Petersburg" " VA*"
    "Fairfax, Fairfax City + Falls Church, VA*"     "Fairfax"                       " Fairfax City + Falls Church"   " VA*"
    "Fairfax, Fairfax City + Falls Church, VA*"     "Fairfax"                       " Fairfax City + Falls Church"   " VA*"
    "Fairfax, Fairfax City + Falls Church, VA*"     "Fairfax"                       " Fairfax City + Falls Church"   " VA*"
    "Frederick + Winchester, VA*"                   "Frederick + Winchester"        " VA*"                           ""    
    "Frederick + Winchester, VA*"                   "Frederick + Winchester"        " VA*"                           ""    
    "Frederick + Winchester, VA*"                   "Frederick + Winchester"        " VA*"                           ""    
    "Greensville + Emporia, VA*"                    "Greensville + Emporia"         " VA*"                           ""    
    "Greensville + Emporia, VA*"                    "Greensville + Emporia"         " VA*"                           ""    
    "Greensville + Emporia, VA*"                    "Greensville + Emporia"         " VA*"                           ""    
    "Henry + Martinsville, VA*"                     "Henry + Martinsville"          " VA*"                           ""    
    "Henry + Martinsville, VA*"                     "Henry + Martinsville"          " VA*"                           ""    
    "Henry + Martinsville, VA*"                     "Henry + Martinsville"          " VA*"                           ""    
    "James City + Williamsburg, VA*"                "James City + Williamsburg"     " VA*"                           ""    
    "James City + Williamsburg, VA*"                "James City + Williamsburg"     " VA*"                           ""    
    "James City + Williamsburg, VA*"                "James City + Williamsburg"     " VA*"                           ""    
    "Montgomery + Radford, VA*"                     "Montgomery + Radford"          " VA*"                           ""    
    "Montgomery + Radford, VA*"                     "Montgomery + Radford"          " VA*"                           ""    
    "Montgomery + Radford, VA*"                     "Montgomery + Radford"          " VA*"                           ""    
    "Pittsylvania + Danville, VA*"                  "Pittsylvania + Danville"       " VA*"                           ""    
    "Pittsylvania + Danville, VA*"                  "Pittsylvania + Danville"       " VA*"                           ""    
    "Pittsylvania + Danville, VA*"                  "Pittsylvania + Danville"       " VA*"                           ""    
    "Prince George + Hopewell, VA*"                 "Prince George + Hopewell"      " VA*"                           ""    
    "Prince George + Hopewell, VA*"                 "Prince George + Hopewell"      " VA*"                           ""    
    "Prince George + Hopewell, VA*"                 "Prince George + Hopewell"      " VA*"                           ""    
    "Prince William, Manassas + Manassas Park, VA*" "Prince William"                " Manassas + Manassas Park"      " VA*"
    "Prince William, Manassas + Manassas Park, VA*" "Prince William"                " Manassas + Manassas Park"      " VA*"
    "Prince William, Manassas + Manassas Park, VA*" "Prince William"                " Manassas + Manassas Park"      " VA*"
    "Roanoke + Salem, VA*"                          "Roanoke + Salem"               " VA*"                           ""    
    "Roanoke + Salem, VA*"                          "Roanoke + Salem"               " VA*"                           ""    
    "Roanoke + Salem, VA*"                          "Roanoke + Salem"               " VA*"                           ""    
    "Rockbridge, Buena Vista + Lexington, VA*"      "Rockbridge"                    " Buena Vista + Lexington"       " VA*"
    "Rockbridge, Buena Vista + Lexington, VA*"      "Rockbridge"                    " Buena Vista + Lexington"       " VA*"
    "Rockbridge, Buena Vista + Lexington, VA*"      "Rockbridge"                    " Buena Vista + Lexington"       " VA*"
    "Rockingham + Harrisonburg, VA*"                "Rockingham + Harrisonburg"     " VA*"                           ""    
    "Rockingham + Harrisonburg, VA*"                "Rockingham + Harrisonburg"     " VA*"                           ""    
    "Rockingham + Harrisonburg, VA*"                "Rockingham + Harrisonburg"     " VA*"                           ""    
    
    end

  • #2
    Short answer is that you can't use split in that way. I am its putative author and so am open to refutation on this point.

    This may help otherwise:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str48 geoname
    "Campbell + Lynchburg, VA*"                    
    "Carroll + Galax, VA*"                         
    "Dinwiddie, Colonial Heights + Petersburg, VA*"
    "Fairfax, Fairfax City + Falls Church, VA*"    
    "Frederick + Winchester, VA*"                  
    "Greensville + Emporia, VA*"                   
    "Henry + Martinsville, VA*"                    
    "James City + Williamsburg, VA*"               
    "Montgomery + Radford, VA*"                    
    "Pittsylvania + Danville, VA*"                 
    "Prince George + Hopewell, VA*"                
    "Prince William, Manassas + Manassas Park, VA*"
    "Roanoke + Salem, VA*"                         
    "Rockbridge, Buena Vista + Lexington, VA*"     
    "Rockingham + Harrisonburg, VA*"               
    end
    
    
    gen state = substr(geoname, strrpos(geoname, ",") + 1 , .)
    gen city = subinstr(geoname, "," + state, "", .)
    replace state = subinstr(state, "*", "", .)
    
    l city state
    
        +--------------------------------------------------+
         |                                     city   state |
         |--------------------------------------------------|
      1. |                     Campbell + Lynchburg      VA |
      2. |                          Carroll + Galax      VA |
      3. | Dinwiddie, Colonial Heights + Petersburg      VA |
      4. |     Fairfax, Fairfax City + Falls Church      VA |
      5. |                   Frederick + Winchester      VA |
         |--------------------------------------------------|
      6. |                    Greensville + Emporia      VA |
      7. |                     Henry + Martinsville      VA |
      8. |                James City + Williamsburg      VA |
      9. |                     Montgomery + Radford      VA |
     10. |                  Pittsylvania + Danville      VA |
         |--------------------------------------------------|
     11. |                 Prince George + Hopewell      VA |
     12. | Prince William, Manassas + Manassas Park      VA |
     13. |                          Roanoke + Salem      VA |
     14. |      Rockbridge, Buena Vista + Lexington      VA |
     15. |                Rockingham + Harrisonburg      VA |
         +--------------------------------------------------+

    Comment


    • #3
      Hi Nick,
      Thank you for your fast response. So that I know how to understand the code, does the "+ 1" specify the location of the second comma? Or what does it do? I'm reading at https://www.stata.com/manuals15/fnstringfunctions.pdf but am not quite sure yet I'd be able to come up with this solution this myself in the future

      Comment


      • #4
        No; strrpos() finds the last comma and +1 just takes you to the next character after that,

        The reason split won't work for you as you wish is that sometimes you want to parse on the first comma of one and sometimes you want the second comma of two.A way to solve that is to look for the last comma. split doesn't have a handle for that.

        You could play at reversing the name and looking for the first comma, but the method I outlined works at least for the examples you gave.

        The Stata 17 manuals are accessible online. Conversely if you are using Stata 15, please specify that in your questions as some solutions may not apply to you.

        Comment


        • #5
          The Stata 17 manuals are accessible online.
          ... and the same PDFs are also installed as part of your Stata installation, accessible through Stata's Help menu, and by clicking on the links to the full documentation found in the output of the help command.

          Comment

          Working...
          X