Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Findname command's alternative usage

    Hi, I was wondering if someone here were able to accomplish some more advance variable set selection using findname ( ssc install findname).

    Code:
    clear
    forvalues i=1/11 {
    generate u`i' = .
    label variable u`i' "`i'"
    }
    
    . desc
    
    Contains data
     Observations:             0                  
        Variables:            11                  
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    Variable      Storage   Display    Value
        name         type    format    label      Variable label
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    u1              float   %9.0g                 1
    u2              float   %9.0g                 2
    u3              float   %9.0g                 3
    u4              float   %9.0g                 4
    u5              float   %9.0g                 5
    u6              float   %9.0g                 6
    u7              float   %9.0g                 7
    u8              float   %9.0g                 8
    u9              float   %9.0g                 9
    u10             float   %9.0g                 10
    u11             float   %9.0g                 11
    ------------------------------------------------------------------------------------------------------------------------------------------------------
    Sorted by: 
         Note: Dataset has changed since last saved.
    OK outputs:
    Code:
    . findname u*, varl("*1*")
    u1   u10  u11
    Moreover, as mentioned in help file, I were supppose to, for example, any(@ < 0) selects numeric variables in which any values are negative. However:

    Code:
    . findname u*, varl((@ >= 5 & @ <=9 ))
    u5
    I was expecting:
    Code:
    u5 u6 u7 u8 u9
    thks, in advance.



  • #2
    findname (I think Nick prefers a reference to SJ dm_0048_4) does not work like that. In the varlabeltext() option, @ is just another character with no special meaning whatsoever. Thus,

    Code:
    findname u*, varl((@ >= 5 & @ <=9 ))

    roughly translates to: among all variables that start with the letter u, find those for which the variable label matches any of "(@", ">=", "5", "&", "@", or "<=9".


    If you tell us more about your real problem, we might be able to suggest a way forward.



    Cox, N. 2020.Finding variables. Stata Journal, 20(2), Stata Journal 15: 605; 12: 167; 10: 691; 10: 281–296.
    Last edited by daniel klein; 03 May 2022, 08:27. Reason: corrections: the option is called varlabelTEXT; "contains" should be "matches"

    Comment


    • #3
      Hi Daniel,

      my approach (@ >= 5 & @ <=9 ) was based on Help file usage tip: any(@ < 0) selects numeric variables in which any values are negative.

      My dataset has +600 variable and variables labels hold a Balance Sheet Chart of Accounts (9 digits fixed formats)

      I need a wise way to select variables based on its variable labels, stores in local macro (`r(varlist)'), to finally insert into :

      Code:
      egen total=rowtotal(`r(varlist)')
      thks,

      Comment


      • #4
        Code:
        . clear
        
        . forvalues i=1/11 {
          2. generate u`i' = .
          3. label variable u`i' "`i'"
          4. }
        
        .
        . ds, has(varlabel 5 6 7 8 9)
        u5  u6  u7  u8  u9
        
        . return list
        
        macros:
                    r(varlist) : "u5 u6 u7 u8 u9"
        
        . describe `r(varlist)'
        
        Variable      Storage   Display    Value
            name         type    format    label      Variable label
        ------------------------------------------------------------------------------------------------
        u5              float   %9.0g                 5
        u6              float   %9.0g                 6
        u7              float   %9.0g                 7
        u8              float   %9.0g                 8
        u9              float   %9.0g                 9
        
        .

        Comment


        • #5
          daniel klein is right. The Stata Journal implementation of findname is definitive as the latest public version and -- my bad, if you like -- the version on SSC is no longer the latest.

          This is an interesting question. First, I confirm that the @ syntax is only understood and supported within any() and all() options of findname. Its use anywhere else may be legal syntax but won't do what you want as the syntax no longer has the interpretation that it refers to variable names.

          For what you want I think you need to fire up a regular expression check which goes beyond what findname supports at present.

          This works:

          Code:
          clear
          forvalues i=1/11 {
          generate u`i' = .
          label variable u`i' "`i'"
          }
          
          foreach v of var * {
             if regexm("`: var label `v''", "5|6|7|8|9") local wanted `wanted' `v'
          }
          
          d `wanted'
          
          
          Variable      Storage   Display    Value
              name         type    format    label      Variable label
          ---------------------------------------------------------------------------------------------------------------
          u5              float   %9.0g                 5
          u6              float   %9.0g                 6
          u7              float   %9.0g                 7
          u8              float   %9.0g                 8
          u9              float   %9.0g                 9
          
          .
          Holding crucial operational information within variable labels is an interesting tactic, but it's immensely easier on the whole to get Stata to select variable names.

          findname has to be put down as a relative failure. I made it public in 2010 as having both less awkward syntax than ds in certain respects (the certain respects were my own earlier work....) and more functionality, but it doesn't seem to have budged ds in popularity, perhaps partly through getting less publicity.

          Comment


          • #6
            Thanks Nick,

            regex approach is fine. However, considering robustness, using an indicative range seem safer to include all possible variables within the boundaries, in a next dataset update.

            I did some searches and end up on this:

            Code:
            clear
            forvalues i=1/11 {
            generate u`i' = .
            label variable u`i' "`i'"
            }
            
            
            foreach v of var u* {
            local x: var label `v'
            local y= real("`x'")
               if (`y' >= 5 & `y' <= 9) local wanted `wanted' `v'
            }
            ​​​​​​​
            d `wanted'
            
            Variable      Storage   Display    Value
                name         type    format    label      Variable label
            ------------------------------------------------------------------------------------------------------------------------------------------------------
            u5              float   %9.0g                 5
            u6              float   %9.0g                 6
            u7              float   %9.0g                 7
            u8              float   %9.0g                 8
            u9              float   %9.0g                 9
            hope it helps someone else.

            Comment


            • #7
              @William Lisowski's approach in #4 is good for this specific problem. I think I didn't see it when I posted mine for a mundane Statalist reason: I started drafting mine before his was posted.

              In #6 this shortening is possible (under the very specific conditions that variable labels are single digit integers):

              Code:
              clear
              forvalues i=1/11 {
              generate u`i' = .
              label variable u`i' "`i'"
              }
              
              foreach v of var u* { 
                  if inrange("`: var label `v''", "5", "9") local wanted `wanted' `v' 
              }
              
              d `wanted'

              Comment

              Working...
              X