Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenating strings piece by piece

    Dear Statalisters,

    I want to concatenate two string variables. Instead of combining them using e.g. concat() I need them to be combined "piece-wise". That is, letter 1 in variable A (e.g. gender) corresponds to letter 1 in variable B (e.g.origin) and so on. Ideally, a delimiter should be added between the "pieces".

    What I have:

    Code:
    clear
    input str20 (A B)
    1111 2345
    end
    What I need:

    Code:
    clear
    input str20 (A B C)
    1111 2345 12_13_14_15
    end
    There might be solutions available that combine multiple loops and temp variables, but I was wondering if there is a more straightforward way to do this. Naturally, the actual number of observations is much larger and I am aware that this is not an optimal way to organize data.

    Thank you and best regards
    Sebastian

  • #2
    If the length is fixed, a regex of the form below will do. Otherwise, for variable lengths, a loop (one) may be the most straightforward approach.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str20(A B C)
    "1111" "2345" "12_13_14_15"
    end
    
    assert length(A)==length(B)
    gen wanted1= ustrregexra(A+ " "+ B, "([\d])([\d])([\d])([\d])\s([\d])([\d])([\d])([\d])", "$1$5_$2$6_$3$7_$4$8")
    
    gen wanted2=""
    gen length= length(A)
    qui sum length
    forval i=1/`r(max)'{
        qui: replace wanted2= wanted2+ substr(A, `i', 1)+ substr(B, `i', 1) + "_" if `i'<=length
    }
    replace wanted2= substr(wanted2, 1, length(wanted2)-1)
    Res.:

    Code:
    . l
    
         +----------------------------------------------------------------+
         |    A      B             C       wanted1       wanted2   length |
         |----------------------------------------------------------------|
      1. | 1111   2345   12_13_14_15   12_13_14_15   12_13_14_15        4 |
         +----------------------------------------------------------------+
    Last edited by Andrew Musau; 02 Aug 2023, 11:15.

    Comment


    • #3
      Hey Andrew,

      this was exactly the kind of elegant solution I was looking for:

      Code:
      assert length(A)==length(B)
      gen wanted1= ustrregexra(A+ " "+ B, "([\d])([\d])([\d])([\d])\s([\d])([\d])([\d])([\d])", "$1$5_$2$6_$3$7_$4$8")
      Thank you for your help!
      Last edited by Sebastian Schirner; 03 Aug 2023, 02:09.

      Comment

      Working...
      X