Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • πŸ˜• Matrix colnames doesn't respect the space character in the names of the columns

    Dear All,

    I have encountered a strange behavior of the colnames extended macro, which doesn't seem to respect the spaces included in the matrix colnames (I didn't test, but I believe the situation with the rownames would be the same).

    Code:
    matrix drop _all
    // create example matrix A:
    matrix A=0,0
    matrix colnames A="Adult males" "Adult females"
    matrix list A    // shows colnames correctly
    
    display `"`: colnames A'"'
    display `"`:word 1 of `: colnames A''"'   // problem
    My expectation was that the colnames extended macro will return the contents of the colnames structure with quotes introduced as necessary to envelop the content containing spaces, but there is nothing in the returned content that would allow me to correctly parse and attribute the parts to different columns of the matrix (in the presence of spaces).

    Since the built-in command shows the colnames structure correctly, it means it is not a storage problem, but the extended macro's problem. So while looking for a workaround I came up with the following 2 solutions: with and without the use of Mata:

    Code:
    // Solution without Mata:
    tempname tmpM
    matrix `tmpM'=A[....,1]
    local l1 `"`: colnames `tmpM''"'
    
    matrix `tmpM'=A[....,2]
    local l2 `"`: colnames `tmpM''"'
    
    display `"`l1'"'
    display `"`l2'"'
    Code:
    // Solution with Mata:
    mata st_local("l1m",st_matrixcolstripe("A")[1,2])
    display `"`l1m'"'
    
    mata st_local("l2m",st_matrixcolstripe("A")[2,2])
    display `"`l2m'"'
    Both seem to work fine, but I don't see the need for such acrobatics to extract the colnames/rownames.

    Hope this would be useful for someone who bumps into the same problem. Should there be a better way, please let me know.

    PS: coincidentally this was the (long forgotten) question I've asked ~10 years ago, which didn't get any responses at that time. Now both the cause of the problem and the workarounds are clear. It is also a good indication that this is unlikely to be fixed/changed at all. πŸ”—


    Best, Sergiy Radyakin

  • #2
    I propose an explanation for the problem with the colnames extended macro function.

    When we look at the output of help matrix colnames we see the following.
    Code:
        Reset column names of matrix
    
            matrix colnames A = names
    
        ...
    
        where name can be
    
            o a simple name;
            o a colon followed by a simple name;
            o an equation name followed by a colon; or
            o an equation name, a colon, and a simple name.
    
        and a simple name may be augmented with time-series operators and factor-variable
        specifications.
    Now, while there's no link from help matrix colnames to the documentation to explain what a "simple name" is, I expected that by "simple name" they meant a name as described in the Stata User's Guide PDF section 11.3 on Naming conventions.

    A name is a sequence of 1 to 32 letters (A–Z, a–z, and any Unicode letter), digits (0–9), and
    underscores ( ).
    ...
    All objects in Stataβ€”not just variablesβ€”follow this naming convention.
    And this interpretation is somewhat reinforced by the lack of any rowname or colname in the examples that does not adhere to this standard for the "simple name".

    However, in fact, the matrix commands allow arbitrary column names, and thus do not adhere to the standards from the User's Guide.
    Code:
    . matrix drop _all
    
    . matrix A = 666
    
    . matrix colnames A="snafu !@#$"
    
    . matrix list A
    
    symmetric A[1,1]
        snafu !@#$
    r1         666
    
    . display colnumb(A,"snafu !@#$")
    1
    I'm guess the author of the colnames extended macro function read the documentation for the matrix colnames command and made the same assumption I did, in the lack of any other definition of "simple name", including the requirement that a simple name with an embedded space must be enclosed in quotation marks on the matrix colnames command.

    Comment


    • #3
      While currently not documented, all the extended macro functions for matrix stripes :colname, :rowname, :colfullname, and :rowfullname allow the quoted option just like extended macro functions :coleq and :roweq.

      Here is a brief example:

      Code:
      . sysuse auto
      (1978 Automobile Data)
      
      . quietly gsem mpg <- turn trunk
      
      . di `"`:colname e(b), quoted'"
      `"turn"' `"trunk"' `"_cons"' `"var(e.mpg)"'"
      
      . di `"`:colfullname e(b), quoted'"
      `""mpg":turn"' `""mpg":trunk"' `""mpg":_cons"' `"/var(e.mpg)"'"
      While the quoted option is not necessary for working with most stripes, Sergei and William make the case for us to do a better job documenting matrix stripes and mentioning support of the quoted option for the above extended macro functions too.

      Comment


      • #4
        Dear William,

        thank you very much for the comment. This is all good, but then StataCorp is not following the same rules and is actively using the non-simple names in the matrices. Here is an example from the bayestest.ado shipped with Stata 16.0
        Code:
        matrix colnames `mcmcsum' = "Chains" "Avg log(ML)" "P(M)" "P(M|y)"
        It violates both "single word" and "no special characters" requirements.

        Best, Sergiy Radyakin


        Comment


        • #5
          My post #4 has crossed with Jeff's answer in #3.

          Thank you very much Jeff Pitblado (StataCorp) , the undocumented option quoted is helpful and solves the problem.

          Comment


          • #6
            Jeff Pitblado (StataCorp) - Thank you for the secret password.

            Let me suggest that help matrix colnames should add some sort of indication of what constitutes a "simple name". I was going to recommend adding a link to [U] 14.2 Row and column names but in fact I don't see any prominent mention of this in the several pages of documentation there, and in any event it would be good to have it in the help file.

            As implied by my post #2, the use of "name" is somewhat problematic because of the emphasis given in [U] 11.3 Naming conventions to the extensive range of this convention. I wondered if a matrix column or row perhaps did not constitute an object, and I guess that is the case. It would be good to add to [U] 11.3 some discussion of "names that are not subject to this convention" since it is so emphatic about names that are subject to the convention.

            Comment


            • #7
              Sergiy Radyakin and William Lisowski you are welcome, glad to help, and sorry for keeping secrets.

              Thank you for your notes and suggestions. We will use them to improve the documentation of matrix stripes in Stata.

              Comment


              • #8
                Jeff Pitblado (StataCorp)

                Using this opportunity, is there any chance that the following two enhancements could be accommodated in the future [or distant future] versions of Stata?

                1. Rownames/colnames to get no upper limit of 32 characters, but to match that of value labels? (this is currently 32,000. Needed to be able to store value labels for table building in row and column titles, rather than in accompanying structures);

                2. Mata matrices to have rownames/colnames too? (for more natural manipulation of Stata matrices in Mata).

                Thank you, Sergiy

                Comment

                Working...
                X