Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -itrim(trim())- function: Excessive spaces don't disappear

    Hi Statalist Community,

    I have a question to ask you, please.
    I'm trying to clean up a string variable so that I can then merge two similar databases, the only thing that changes is the years between them. These are data sets on the average prices of cars put up for sale in Spain.
    Here's what I do for year 2018:


    Code:
    //------ Local environment (adapt that to your current path) ------//
    * cd path
    global data "C:/Users/miduarte/Desktop/Ongoing_Projects/LEZ_Project/Data/car_prices_excel_15-19"
    cd "${data}"
    //------------//
    *
    *
    * 2018 Car Prices
    
    clear all
    import excel "${data}/car_prices_2018.xlsx", sheet("turismos") firstrow allstring case(lower)
    gen str type =  "turismos"
    tempfile turismos_18
    save `turismos_18', replace
    
    clear all
    import excel "${data}/car_prices_2018.xlsx", sheet("todoterreno") firstrow allstring case(lower)
    gen str type =  "todoterreno"
    tempfile todoterreno_18
    save `todoterreno_18', replace
    
    *Append with turismos
    clear*
    use `turismos_18', replace
    append using `todoterreno_18'
    
    replace modelotipo = itrim(trim(modelotipo))
    Problem: no changes take place, whereas both in the Excel file, and through a "manual" inspection through stata, I observe that several excessive spaces exist, even though running the code above.
    Here's an example:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str141 modelotipo
    "4C 1.7 Tbi TCT /  1.75 6V  240"      
    "4C SPIDER 1.7 Tbi TCT /  1.75 6V  240"
    "ALFA  33 1.3"                          
    "ALFA  33 1.5"                          
    "ALFA  33 1.7"                          
    end
    or maybe this example, which is striking:



    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16 marca str141 modelotipo
    "LAND ROVER" "Defender II 90 TDI       SW / 2.5 Td5 Pick Up S / 2.5 Td5 SW E"        
    "LAND ROVER" "Defender II 90  TDI SW Back"                                                
    "LAND ROVER" "Defender II 90 TDI       Techo Duro-Lona/T.D.Comerc.SW/Td5 Pick Up E"  
    "LAND ROVER" "Defender II 90 TDI T. Duro Comerc."                                          
    "LAND ROVER" "Defender II 90  TDI Tomb Rider"                                             
    "LAND ROVER" "Defender II 110 Td5        2.5 SW E Com. / D. Cab Caja S / HCPU S Com"
    end


    Could anyone give me a helping hand with that, please?
    Thank you very much in advance!

    Michael
    Last edited by Michael Duarte Goncalves; 13 Mar 2024, 09:26.

  • #2
    I can't replicate your issue. The spaces leading, trailing and intermediate spaces are eliminated if I run your example.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str141 modelotipo
    "4C 1.7 Tbi TCT /  1.75 6V  240"      
    "4C SPIDER 1.7 Tbi TCT /  1.75 6V  240"
    "ALFA  33 1.3"                        
    "ALFA  33 1.5"                        
    "ALFA  33 1.7"                        
    end
    
    replace modelotipo = itrim(trim(modelotipo))
    dataex
    Res.:

    Code:
    . replace modelotipo = itrim(trim(modelotipo))
    (5 real changes made)
    
    . dataex
    
    ----------------------- copy starting from the next line -----------------------
    
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str141 modelotipo
    "4C 1.7 Tbi TCT / 1.75 6V 240"      
    "4C SPIDER 1.7 Tbi TCT / 1.75 6V 240"
    "ALFA 33 1.3"                        
    "ALFA 33 1.5"                        
    "ALFA 33 1.7"                        
    end

    Comment


    • #3
      Hi Andrew Musau:

      Thank you for your prompt response!

      This is weird. Here is what I obtained in my true dataset:

      Code:
       replace modelotipo = itrim(trim(modelotipo))
      (0 real changes made)
      In fact, I tried using -itrim(trim(modelotipo))- with the example provided, and it works. But definitely not in my whole dataset.

      Thank you anyway for your help Andrew.
      Last edited by Michael Duarte Goncalves; 13 Mar 2024, 09:31.

      Comment


      • #4
        You can check if you have other characters that look like spaces.

        Code:
        ssc install chartab, replace
        chartab
        Then see

        Code:
        help ustrtrim

        Comment


        • #5
          Spaces means what it says. There are all sorts of invisible or unprintable characters which may look like plain spaces, but they are not such and so will not be removed by any trimming function. An example is uchar(160).

          Code:
          . clear
          
          . set obs 1
          Number of observations (_N) was 0, now 1.
          
          . gen text = uchar(160) + "frog"
          
          . replace text = ltrim(text)
          (0 real changes made)
          Check for special characters using charlist or chartab [much better] from SSC.

          Comment


          • #6
            Oh thank you for this command Andrew Musau:

            Indeed, after using -chartab()-, I realize I have two "types" of white spaces:

            Code:
            160      \u00a0             |        41,878    NO-BREAK SPACE
            32       \u0020             |       230,287    SPACE
            Thank you again for your time and help!
            Last edited by Michael Duarte Goncalves; 13 Mar 2024, 09:39.

            Comment


            • #7
              Hi Nick,

              Nice to read you again! Yes, indeed I have some uchar(160) in my dataset. What does that mean?
              • Is there a way to delete those excessive uchar(160)?
              Thank you both for your time!

              Michael
              Last edited by Michael Duarte Goncalves; 13 Mar 2024, 09:40.

              Comment


              • #8

                Code:
                replace whatever = subinstr(whatever, uchar(160), "", .)

                Comment


                • #9
                  It works perfectly, thank you so much for your time and help.
                  Thanks again:

                  Nick Cox
                  Andrew Musau

                  Lovely afternoon to both of you.
                  Michael
                  Last edited by Michael Duarte Goncalves; 13 Mar 2024, 09:57.

                  Comment

                  Working...
                  X