How to differentiate "tabulate", "table", "tabstat", "tabdisp"?

Yao Zhao

Join Date: Feb 2017

Posts: 226
#1

How to differentiate "tabulate", "table", "tabstat", "tabdisp"?

25 Nov 2018, 13:39

I don't know which one should be used in specific cases.

Many thanks in advance!
Tags: None

David Benson

Join Date: Oct 2018
Posts: 489

25 Nov 2018, 21:11

So, it's probably easiest to see with some examples. BTW, I end up using tabulate and tabstat a lot, table less so, and I have to look up the syntax every time I do. I've never used tabdisp.

So, tabulate is great for counting the number of observations in various categories:

Code:

. tabulate year_founded if sample==1 & inrange(year_founded, 1995, 2000)

       Year |
    founded |
    (min of |
   founding |
  year from |
    Gerard, |
NCET, NETS, |
      & VX) |      Freq.     Percent        Cum.
------------+-----------------------------------
       1995 |         39        7.56        7.56
       1996 |         68       13.18       20.74
       1997 |         85       16.47       37.21
       1998 |         97       18.80       56.01
       1999 |         89       17.25       73.26
       2000 |        138       26.74      100.00
------------+-----------------------------------
      Total |        516      100.00


. tabulate year_founded target_success if  sample==1 & inrange(year_founded, 1995, 2000)

      Year |
   founded |
   (min of |
  founding |
 year from |
   Gerard, |
     NCET, |  1 if target had IPO
   NETS, & |    or acquisition
       VX) |         0          1 |     Total
-----------+----------------------+----------
      1995 |        14         25 |        39
      1996 |        40         28 |        68
      1997 |        56         29 |        85
      1998 |        59         38 |        97
      1999 |        64         25 |        89
      2000 |       105         33 |       138
-----------+----------------------+----------
     Total |       338        178 |       516


. tabulate target_status if  sample==1 & inrange(year_founded, 1995, 2000)

   IPO, Acquired, |
etc. Zombie means |
         <=2 emps |      Freq.     Percent        Cum.
------------------+-----------------------------------
       1 - Zombie |         44        8.53        8.53
2 - Going Concern |        230       44.57       53.10
     3 - Acquired |        113       21.90       75.00
          4 - IPO |         65       12.60       87.60
       5 - Failed |         64       12.40      100.00
------------------+-----------------------------------
            Total |        516      100.00

tabstat is a lot like summarize, it just gives you more options over which stats to include. I use it a lot because I like to see the median.

Code:

* Doing it with summarize
. summ max_emp cum_patents_age3 age_exit if located_in_cluster ==0 & sample==1 & inrange(year_founded, 1995, 2000)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     max_emp |        277     29.6787    84.97388          1       1060
cum_patent~3 |        319    1.416928    4.083079          0         41
    age_exit |         98    9.877551    4.899538        -11         21

. summ max_emp cum_patents_age3 age_exit if located_in_cluster ==1 & sample==1 & inrange(year_founded, 1995, 2000)

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     max_emp |        185    114.6703    385.0243          1       3000
cum_patent~3 |        197    4.137056    11.53241          0         76
    age_exit |        111    7.828829     4.31672          0         20


* Doing the same thing with tabstat
. tabstat max_emp cum_patents_age3 age_exit if  sample==1 & inrange(year_founded, 1995, 2000), stats(n mean p25 median p75 min max) col(stat)
>  by(located_in_cluster)

Summary for variables: max_emp cum_patents_age3 age_exit
     by categories of: located_in_cluster (1 if startup located in Silicon V, Boston, or San Diego (no matter where started)

located_in_cluster |         N      mean       p25       p50       p75       min       max
-------------------+----------------------------------------------------------------------
                 0 |       277   29.6787         4        10        25         1      1060
                   |       319  1.416928         0         0         1         0        41
                   |        98  9.877551         7        10        13       -11        21
-------------------+----------------------------------------------------------------------
                 1 |       185  114.6703         6        21        65         1      3000
                   |       197  4.137056         0         0         2         0        76
                   |       111  7.828829         5         7        11         0        20
-------------------+----------------------------------------------------------------------
             Total |       462  63.71212         5        13        35         1      3000
                   |       516  2.455426         0         0         1         0        76
                   |       209  8.789474         5         8        12       -11        21
------------------------------------------------------------------------------------------

Table is a little more flexible, but usually requires more typing

Code:

. table located_in_cluster if sample==1 & inrange(year_founded, 1995, 2000), c(n max_emp mean max_emp median max_emp p75 max_emp) row col

----------------------------------------------------------------------
1 if      |
startup   |
located   |
in        |
Silicon   |
V,        |
Boston,   |
or San    |
Diego (no |
matter    |
where     |
started   |    N(max_emp)  mean(max_emp)   med(max_emp)   p75(max_emp)
----------+-----------------------------------------------------------
        0 |           277        29.6787             10             25
        1 |           185       114.6703             21             65
          |
    Total |           462       63.71212             13             35
----------------------------------------------------------------------

Hope that helps!

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#3

25 Nov 2018, 22:18

I've never used tabdisp.

I suspect that the vast majority of Stata users have never used it either. -tabdisp- is just a command that writes data from a long data set onto the Results window in the layout that -table- produces. -table- is, in fact, a wrapper for -tabdisp-: it produces an appropriate data set of statistics that is a suitable input for -tabdisp-, calls -tabdisp-, restores the original data, and then exits. The "produces an appropriate data set" part is done with -collapse-

I have used -tabdisp- a handful of times in the 24 years I have been using Stata. The situation where it came in handy is when the data set is very large and a complicated table is needed: the calls to -collapse- that -table- uses can be time consuming. In this situation, you can gain appreciable efficiency by generating the appropriate statistics for input to -tabdisp- yourself using -gen- and -egen- and -keep if _n == 1- commands tailored specifically to your problem that run much faster than -collapse- (which is burdened with decoding an elaborate syntax and having to cope with all manner of special situations and problems that might arise in the general case), and then -tabdisp- writes it out in the desired way.

In general, though, it's just simpler to use -table-, and the efficiency penalty for doing so is usually ignorable.
2 likes
Comment

Announcement