Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Removing letters from numeric observations

    I have a dataset of people who were supposed to write the year since when they live in their current dwelling in numerical terms, but some people have also written some text in those cells. Please see the example below. I would be grateful if you could advise me on how I can remove any letters from those observations. For example, instead of having "26 ans", I would have "26". Thank you very much in advance!
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input strL M_tro
    "1999"  
    "2005"  
    "1962"  
    "2015"  
    "1995"  
    "2004"  
    "2017"  
    "2009"  
    "2015"  
    "2014"  
    "2001"  
    "1990"  
    "2015"  
    "2017"  
    "1972"  
    "1950"  
    "2018"  
    "2004"  
    "2015"  
    "2"    
    "1978"  
    "2021"  
    "2002"  
    "1989"  
    "2008"  
    "1990"  
    "2012"  
    "1999"  
    "1995"  
    "2010"  
    "2008"  
    "1982"  
    "1996"  
    "2016"  
    "1975"  
    "1999"  
    "1988"  
    "2017"  
    "16 ans"
    "1978"  
    "1985"  
    "1987"  
    "1971"  
    "2007"  
    "2018"  
    "2009"  
    "1989"  
    "1996"  
    "2017"  
    "2010"  
    "2019"  
    "1988"  
    "1975"  
    "1991"  
    "1992"  
    "2011"  
    "1990"  
    "2001"  
    "2013"  
    "2009"  
    "2002"  
    "1986"  
    " "    
    "21"    
    "2005"  
    "2003"  
    "5"    
    "2003"  
    "2005"  
    "23"    
    "2012"  
    "2004"  
    "2008"  
    "2014"  
    "2010"  
    "2004"  
    "1978"  
    "2019"  
    "2013"  
    "1984"  
    "1964"  
    "2016"  
    "1989"  
    "1969"  
    "1990"  
    "2010"  
    "2018"  
    "1998"  
    "2020"  
    "2020"  
    "2012"  
    "2015"  
    "27"    
    "1990"  
    "2015"  
    "1997"  
    "1983"  
    "60"    
    "2016"  
    "2006"  
    end
    Last edited by Nick Baradar; 23 Mar 2022, 14:26.

  • #2
    I am sorry, it is not an age, it is the year since when they live in their current dwelling.

    Comment


    • #3
      Well, since it is a year, you are going to eventually need to -destring- it for it to be of any use. So you can use the -ignore- option of that command to get rid of all letters:

      Code:
      destring M_tro, replace ignore("`c(alpha)' `c(ALPHA)'")
      No doubt one could first use regular expressions to clean up M_tro as a string first, but a year variable as a string will be useless anyway, so might as well kill the two birds with one stone.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Well, since it is a year, you are going to eventually need to -destring- it for it to be of any use. So you can use the -ignore- option of that command to get rid of all letters:

        Code:
        destring M_tro, replace ignore("`c(alpha)' `c(ALPHA)'")
        No doubt one could first use regular expressions to clean up M_tro as a string first, but a year variable as a string will be useless anyway, so might as well kill the two birds with one stone.
        Thank you very much, appreciate your help!

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Well, since it is a year, you are going to eventually need to -destring- it for it to be of any use. So you can use the -ignore- option of that command to get rid of all letters:

          Code:
          destring M_tro, replace ignore("`c(alpha)' `c(ALPHA)'")
          No doubt one could first use regular expressions to clean up M_tro as a string first, but a year variable as a string will be useless anyway, so might as well kill the two birds with one stone.
          Dear Clyde,
          I have a similar issue with another variable, and it seems like the code you advised doesn't help with these observations.The message that I recieve is "Offset2: contains characters not specified in ignore(); no replace".
          I will be really grateful if you could help with this issue as well. Please see the extracted data below:
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input strL Offset2
          "100"        
          "10"         
          "100"        
          "10"         
          "50"         
          "250"        
          "0"          
          "50"         
          "0"          
          "0"          
          "250"        
          "99"         
          "50"         
          "0"          
          "0"          
          "200€"     
          "100"        
          "0"          
          "00"         
          "50"         
          "0"          
          "5"          
          "500"        
          "10"         
          "100"        
          "0"          
          "50"         
          "200"        
          "100"        
          "0"          
          "100"        
          "200"        
          "100"        
          "0"          
          "100"        
          "200"        
          "500"        
          "10"         
          "250"        
          "200"        
          "0"          
          "1"          
          "0"          
          "20"         
          "80"         
          "150"        
          "100"        
          "50"         
          "0"          
          "200"        
          "20"         
          "500"        
          "0"          
          "20"         
          "100"        
          "50"         
          "100"        
          "15"         
          "100"        
          "100"        
          "100"        
          "30"         
          "100"        
          "20"         
          "150"        
          "0"          
          "200"        
          "100"        
          "100"        
          "120"        
          "50"         
          "0"          
          "0"          
          "240"        
          "100"        
          "0"          
          "0"          
          "10"         
          "15"         
          "50"         
          "0"          
          "10"         
          "0"          
          "50"         
          "50"         
          "150"        
          "200"        
          "0"          
          "50"         
          "120"        
          "120 € max"
          "100"        
          "0"          
          "0"          
          "100"        
          "100"        
          "50"         
          "50"         
          "50"         
          "50"         
          end

          Comment


          • #6
            So, this one contains a character that is neither a number nor a letter, €. The code I showed only removes letters (and spaces), nothing else. You could extend it by adding € to the list in the -ignore()- option, and that will carry you through on this one. But if there are yet other variables you need to destring, who knows what other characters you may encounter? What you really need is a way of saying "just keep the digits." That is where regular expressions come in. Unfortunately, I have never been able to wrap my head around the notation for regular expressions. So I'm going to back away here and leave it to one of the many people on the Forum who are fluent in regular expressions to show you the code for that. That is the best way to do it.

            Comment


            • #7
              Code:
              gen wanted = real(ustrregexra(Offset2,"[^0-9]",""))

              Comment

              Working...
              X