Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Detailed summary statistics with dummy variables

    Hi guys,

    I need to create a descriptive statistics table on a dummy variable. With the tab(...), sum(...) I am able to get to the mean and st. deviations, but I also want the number of observations, minimums and maximums. This I get by just summarizing the variables, but then it is not sorted by the dummy (1 or 0).

    So is there an expansion to this tab, sum command to get to the more detailed summary statistics?

  • #2
    You say dummy, I say indicator (and others line up on different sides). Much more on that from two of your stalwarts at https://journals.sagepub.com/doi/ful...36867X19830921

    In the auto data, foreign is an indicator and its two distinct values are thus the minimum and maximum with 52 and 22 values respectively.

    Code:
    . sysuse auto
    (1978 Automobile Data)
    
    . tab foreign
    
       Car type |      Freq.     Percent        Cum.
    ------------+-----------------------------------
       Domestic |         52       70.27       70.27
        Foreign |         22       29.73      100.00
    ------------+-----------------------------------
          Total |         74      100.00
    A command could certainly be written to write # of zeros, # of ones, mean and SD of indicators.

    Comment


    • #3
      It is not clear to me what exactly your variables are, and what you want your table to look like. Without that it is hard for me to give you advise. Can you give an extract of your data (see help dataex) and an example of the table you want to produce?

      However, I do have some comments: For a binary variable you typically don't report the standard deviation, because in binary variables the standard deviation is just a function of the mean and the number of observations and nothing else. So the standard deviation adds exactly nothing in that case. Similarly the minimum and maximum for binary variables are hardly informative, they are just 0 and 1 (even if that is not the case, they will be treated as if that is the case).

      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Hi guys,

        Thanks for your reactions. @Maarten, it is not the summary on the dummy/indicator itself that I am interested in. Please note the following information:

        This is an extract from my dataset:
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input byte dummy float(ROA_w MTB_w logSIZE_w)
        1  .015737535  2.702192  7.197375
        1  .014516745 1.5419233   6.55542
        1  .006441738   2.69291   7.27595
        1  .068453796  4.180245  6.569367
        0   .05717357  3.168955  8.360762
        1  .032652512  3.789051  4.064829
        1  .015562552 1.7704692  5.699063
        1  .018427841  2.544254  6.847771
        1  .013549432  1.822712  6.019018
        1   .04066389 1.8643322  4.572402
        1   .17993362   2.40383   3.09308
        Now what I want is to create a table with two panels. Panel A for the firms that score a 1 on the dummy/indicator and Pabnel B for the firms that score a 0 on the dummy/indicator.
        The table further should consist of the information of the three other variables; mean, median, st. dev, minimum, maximum, and number of observations.

        So:
        Variable name Mean Median St. dev Minimum Maximum N
        1
        2
        3
        If you need more information, please ask!

        Comment


        • #5
          Code:
          clear
          input byte dummy float(ROA_w MTB_w logSIZE_w)
          1  .015737535  2.702192  7.197375
          1  .014516745 1.5419233   6.55542
          1  .006441738   2.69291   7.27595
          1  .068453796  4.180245  6.569367
          0   .05717357  3.168955  8.360762
          1  .032652512  3.789051  4.064829
          1  .015562552 1.7704692  5.699063
          1  .018427841  2.544254  6.847771
          1  .013549432  1.822712  6.019018
          1   .04066389 1.8643322  4.572402
          1   .17993362   2.40383   3.09308
          end
          
          matrix res = J(6,6,.)
          matrix colnames res = mean median st_dev minimum maximum N
          matrix rownames res = first:ROA_w first:MTB_w first:logSIZE_w ///
                                second:ROA_w second:MTB_w second:logSIZE_w
          
          local i = 1
          forvalues d = 0/1 {
              foreach var of varlist ROA_w MTB_w logSIZE_w {
                  qui sum `var' if dummy == `d', detail
                  matrix res[`i',1] = r(mean), r(p50), r(sd), r(min), r(max), r(N)
                  local i = `i' + 1
              }
          }
          matlist res, underscore format(%9.3g)
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6

            Code:
            * !!! 
            help tabstat

            Comment

            Working...
            X