Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • tabstat with formats

    Hi All,

    Quick question about using tabstat with format option. I managed to summarize two variables:
    1. Group variable: pt15a
    2. Date variable: startdate (need to identify the earliest date)
    I ran the following code:
    Code:
     tabstat startdate, by(pt15a) stats(count min)
    and the result was like:

    Click image for larger version

Name:	Screenshot 2023-07-04 at 2.51.15 pm.png
Views:	1
Size:	31.9 KB
ID:	1719283


    Apparently, the startdate value was not formatted, then I add the format option as below:

    Code:
    tabstat startdate, by(pt15a) stats(count min) nototal format(%tdDD/NN/CCYY)
    Unexpectedly, it also formatted the number of observations.

    Click image for larger version

Name:	Screenshot 2023-07-04 at 2.53.33 pm.png
Views:	1
Size:	50.5 KB
ID:	1719284


    Could anyone please advise how should I modify the code? Many thanks.

  • #2
    I would switch to tabdisp here, although that is just one way to tackle the problem.

    Most esoteric tip: with some display formats, tabstat shows sample size with decimal places, which I never want. But tabdisp allows a mix of formats, as if all else fails string variables can be used to show numerical results with any desired number of decimal places.

    Code:
    . clear
    
    . set obs 100
    Number of observations (_N) was 0, now 100.
    
    . set seed 314159
    
    . gen date = runiformint(mdy(1,1,1980), mdy(12,31,2010))
    
    . egen group = seq(), block(10)
    
    .
    . bysort group : gen N = strofreal(_N)
    
    . bysort group : egen first_date = min(date)
    
    . format first_date %td
    
    .
    . tabdisp group, c(N first_date)
    
    ----------------------------------
        group |          N  first_date
    ----------+-----------------------
            1 |         10   03nov1983
            2 |         10   07aug1989
            3 |         10   22nov1983
            4 |         10   07mar1980
            5 |         10   17sep1981
            6 |         10   11nov1989
            7 |         10   29apr1981
            8 |         10   15mar1982
            9 |         10   09sep1985
           10 |         10   26jun1981
    ----------------------------------
    
    .
    . bysort group : gen n = _N
    
    .
    . tabdisp group, c(n first_date)
    
    ----------------------------------
        group |          n  first_date
    ----------+-----------------------
            1 |         10   03nov1983
            2 |         10   07aug1989
            3 |         10   22nov1983
            4 |         10   07mar1980
            5 |         10   17sep1981
            6 |         10   11nov1989
            7 |         10   29apr1981
            8 |         10   15mar1982
            9 |         10   09sep1985
           10 |         10   26jun1981
    ----------------------------------
    That isn't needed for the example above. But remember the trick: call up strofreal() on the fly with a specified display format.

    For example, moments from SSC allows any combination of formats for its results. The default is to show integers for sample size and 3 d.p. for everything else. Here you decide that 1 d.p. is enough for the mean and 2 d.p. enough for the SD, and accept the defaults for skewness and kurtosis.

    Code:
    . sysuse auto, clear
    (1978 automobile data)
    
    . moments mpg, by(foreign)
    
    ----------------------------------------------------------------------
        Group |          n        mean          SD    skewness    kurtosis
    ----------+-----------------------------------------------------------
     Domestic |         52      19.827       4.743       0.771       3.441
      Foreign |         22      24.773       6.611       0.657       3.107
    ----------------------------------------------------------------------
    
    . moments mpg, by(foreign) format(%2.1f %3.2f)
    
    ----------------------------------------------------------------------
        Group |          n        mean          SD    skewness    kurtosis
    ----------+-----------------------------------------------------------
     Domestic |         52        19.8        4.74       0.771       3.441
      Foreign |         22        24.8        6.61       0.657       3.107
    ----------------------------------------------------------------------
    You don't have to care about moments, or moment-based measures, for the example to be useful, as a glance at the code for moments shows how you can mix display formats for yourself.

    For more tips in the same spirit, see https://journals.sagepub.com/doi/pdf...867X1201200109
    Last edited by Nick Cox; 04 Jul 2023, 02:15.

    Comment


    • #3
      If you're on Stata 17 or later, you could also achieve this with the table command. Something like:

      Code:
      table pt15a , stat(count startdate) stat(min startdate) nformat(%tdDD/NN/CCYY min)
      Last edited by Hemanshu Kumar; 05 Jul 2023, 06:08.

      Comment


      • #4
        Many thanks to Nick Cox and Hemanshu Kumar !

        Comment

        Working...
        X