Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Table - unexpected behaviour

    I have a dataset that has been imported from Excel, I do not have the spreadsheet, and the variable Sex should be either M,F or missing, unfortunately some of the values included a trailing space. When I summarise the data I get these results using tab, fre(from SSC) and table:


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 Sex int _freq
    ""     5
    "F"   47
    "M"  104
    "M "   1
    end
    Code:
    . expand _freq
    (153 observations created)
    . chartab Sex
    
       decimal  hexadecimal   character |     frequency    unique name
    ------------------------------------+----------------------------------------
            32       \u0020             |             1    SPACE
            70       \u0046       F     |            47    LATIN CAPITAL LETTER F
            77       \u004d       M     |           105    LATIN CAPITAL LETTER M
    ------------------------------------+----------------------------------------
    
                                        freq. count   distinct
    ASCII characters              =             153          3
    Multibyte UTF-8 characters    =               0          0
    Unicode replacement character =               0          0
    Total Unicode characters      =             153          3
    
    
    . tab Sex
    
            Sex |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              F |         47       30.92       30.92
              M |        104       68.42       99.34
             M  |          1        0.66      100.00
    ------------+-----------------------------------
          Total |        152      100.00
    
    . fre Sex
    
    Sex -- Sex
    -----------------------------------------------------------
                  |      Freq.    Percent      Valid       Cum.
    --------------+--------------------------------------------
    Valid   F     |         47      29.94      30.92      30.92
            M     |        104      66.24      68.42      99.34
            M     |          1       0.64       0.66     100.00
            Total |        152      96.82     100.00           
    Missing       |          5       3.18                      
    Total         |        157     100.00                      
    -----------------------------------------------------------
    
    . table Sex
    
    --------------------
            |  Frequency
    --------+-----------
    Sex     |           
      F     |         47
      M     |          1
      Total |        152
    --------------------
    
    .
    My query is why does table does not recognise all the values of Sex?

    Thank you,
    Martyn

    Stata 17 MP

  • #2
    I think this is still an issue with how -table- handles string variables. It can also be avoided with two data management steps which are better suited to this particular problem (though doesn't dismiss the relevant issue with table). One, string variables are best trimmed of leading and trailing whitespace. Second, and more appropriate, would be to use a labeled numeric variable which will not cause this issue with table.

    Comment


    • #3
      Thank you for your response. As you suggest I normally trim my string variables, but in this case I was just looking a new dataset and found the problem. I think that this is a serious issue with table and could lead to errors; I will ask Stata Support for their comments.

      Comment


      • #4
        Stata Technical Support have confirmed that this is a bug in tables, and that it will be fixed in a future update.

        Comment

        Working...
        X