Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regular expression transform functions + eval() in Mata

    Mata lacks a function that could transform (make uppercase, lowercase, etc.) a pattern inside a string. Imagine that you have 1600 string observations in a string variable and you must capitalize every letter after a comma.
    Here's a way to do it, the ustrregextf and ustrregexta functions.

    Code:
    *! version 1.0.6  07oct2024
    
    version 16.0
    clear all
    
    loc RS        real scalar
    loc SS        string scalar
    loc SM        string matrix
    
    mata:
    mata set matastrict on
    
    `SM' ustrregextf(`SM' s1, `SS' re, `SS' t,| `RS' noc)
    {
        `RS' i, j
        `SS' g, s
        noc = noc != . ? noc : 0
    
        s = st_tempname()
        for(i = 1; i <= rows(s1); i++) {
            for(j = 1; j <= cols(s1); j++) {
                (void) ustrregexm(s1[i,j], re, noc)
                if ((g=ustrregexs(0)) != "") {
                    stata(`"mata: st_strscalar(""'        + s + `"", "'            +
                                               t + `"(""' + g + `""))"')
                    s1[i,j] = usubinstr(s1[i,j], g,  st_strscalar(s), 1)
                }
            }
        }
        return(s1)
    }
    
    `SM' ustrregexta(`SM' s1, `SS' re, `SS' t,| `RS' noc)
    {
        `RS' i, j
        `SS' g, g_flag, s
        noc = noc != . ? noc : 0
    
        s = st_tempname()
        for(i = 1; i <= rows(s1); i++) {
            for(j = 1; j <= cols(s1); j++) {
                while(1) {
                    (void) ustrregexm(s1[i,j], re, noc)
                    if ((g=ustrregexs(0)) != "" & g != g_flag) {
                        stata(`"mata: st_strscalar(""'              + s + `"", "'  +
                                                         t + `"(""' + g + `""))"')
                        s1[i,j] = usubinstr(s1[i,j], g, (g_flag=st_strscalar(s)), 1)
                    } else break
                }
            }
        }
        return(s1)
    }
    end
    
    version 18.0: lmbuild llanguagetools.mlib, replace size(4)
    Now all one needs to type is
    Code:
    mata: st_sview((X=""), ., "varname"); ustrregexta(X, ",\s*[a-z]", "ustrupper");
    or, for strL,
    Code:
    mata: st_sstore(., "varname", ustrregexta(st_sdata(., "varname"), ",\s*[a-z]", "ustrupper"))
    NB The transformation function is passed as a string argument, not as a pointer. An improvised "eval()" is used (I added it to the topic title since people occasionally ask how to make an eval() in Mata).
    Last edited by Ilya Bolotov; 08 Oct 2024, 16:33.
Working...
X