Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "collapse by" with total

    Is anyone aware of a neat way of obtaining both the total AND the stratified rows when using the "collapse, by" command?

    For example:

    Code:
    clear
    input float(sex death population)
    1 1 1
    1 0 1
    1 0 1
    1 1 1
    1 1 1
    1 0 1
    1 0 1
    1 0 1
    2 1 1
    2 1 1
    2 1 1
    2 0 1
    2 0 1
    2 1 1
    2 1 1
    2 1 1
    end
    It seems a bit long winded to have to run:

    Code:
    collapse (sum)death=death (sum)population=population
    and then recall the original dataset and run:

    Code:
    collapse (sum)death=death (sum)population=population, by(sex)
    and then append the two.

  • #2
    Here's an example - a hack, really - that might start you in a useful direction. Or at least provide some amusement at its hackishness.
    Code:
    . expand 2, generate(gnxl)
    (16 observations created)
    
    . replace sex=9 if gnxl==1
    (16 real changes made)
    
    . label define SEX 1 "Yes" 2 "No" 9 "Total"
    
    . label values sex SEX
    
    . drop gnxl
    
    . collapse (sum)death=death (sum)population=population, by(sex)
    
    . list, abbreviate(16)
    
         +----------------------------+
         |   sex   death   population |
         |----------------------------|
      1. |   Yes       3            8 |
      2. |    No       6            8 |
      3. | Total       9           16 |
         +----------------------------+

    Comment


    • #3
      Code:
      . tab sex death [fw=population]
      
                 |         death
             sex |         0          1 |     Total
      -----------+----------------------+----------
               1 |         5          3 |         8 
               2 |         2          6 |         8 
      -----------+----------------------+----------
           Total |         7          9 |        16 
      
      .

      Comment


      • #4
        Thanks. I need the result to be a dataset rather than an output.
        I think some variation of William's suggestion (or appending) will do the trick. I just wanted to check I wasn't overlooking something else available in the collapse command.

        Comment


        • #5
          Code:
          . contract sex death [fw=population]
          
          . l
          
               +---------------------+
               | sex   death   _freq |
               |---------------------|
            1. |   1       0       5 |
            2. |   1       1       3 |
            3. |   2       0       2 |
            4. |   2       1       6 |
               +---------------------+
          Marginal frequencies are then obtainable through egen

          Comment


          • #6
            That's potentially useful too, thanks Nick.

            The core of the problem is getting the sex variable to represent males, females, and total. William's method achieves this but I realise it's unconventional use of Stata. For some purposes it's very useful though, for example if I want to carry out further calculations on aggregated frequencies. The collapse command allows you to split the frequencies by any number of different variables (all in one go) but it doesn't allow you to retain the total, unless you hack or append.

            Comment


            • #7
              OK, There are many ways to do this but my bottom line is that I can't think of any that are really different from and better than suggestions so far, particularly William's in #2.
              You seem to be hinting at a wider context of wanting to do this again for categories of yet more variables.

              Comment


              • #8
                estout from Stata Journal/ SSC stores a matrix with the totals after saving results from tab. The only issue is that the matrix stacks the equations, so you have to unstack them. I had written a program to do this here. If you will be doing this often, then it may be more efficient to use this route. You may need to modify the code if your labels are not legal Stata names, using, e.g., the function -strtoname()-. As you want both the row names and column names in the dataset, the following additionally uses xsvmat from SSC to convert the matrix to a dataset as there is no straightforward way to add row names to the dataset using svmat.

                Code:
                cap program drop unstack
                program define unstack, eclass
                mat w= e(b)'
                local cols: roweq(w)
                local cols: list uniq cols
                local i 1
                foreach colname of local cols{
                    mat w`i'= w["`colname':", 1...]
                    local ++i
                }
                local w
                forval j=1/`=`i'-2'{
                    local w "`w' w`j',"
                }
                local w = "`w' w`=`i'-1'"
                mat w= `w'
                mat colnames w= `cols'
                end
                
                clear
                input float(sex death population)
                1 1 1
                1 0 1
                1 0 1
                1 1 1
                1 1 1
                1 0 1
                1 0 1
                1 0 1
                2 1 1
                2 1 1
                2 1 1
                2 0 1
                2 0 1
                2 1 1
                2 1 1
                2 1 1
                end
                lab def gender 1 "female" 2 "male"
                lab def status 0 "dead" 1 "alive"
                lab values sex gender
                lab values death status
                
                estpost tab sex death [fw=population]
                local rowvar= "`e(rowvar)'"
                unstack
                clear
                xsvmat w, rownames(`rowvar') names(matcol) norestore
                rename w* *
                Res.:

                Code:
                . l
                
                     +-------------------------------+
                     |    sex   dead   alive   Total |
                     |-------------------------------|
                  1. | female      5       3       8 |
                  2. |   male      2       6       8 |
                  3. |  Total      7       9      16 |
                     +-------------------------------+

                Comment


                • #9
                  Interesting. Thank you Andrew.

                  Comment


                  • #10
                    This looks like a job for what I call list tables.

                    Code:
                    . egen tag = tag(sex)
                    
                    . egen deaths = total(death), by(sex)
                    
                    . egen populi = total(population), by(sex)
                    
                    . list deaths populi if tag, noobs sum(deaths populi)
                    
                          +-----------------+
                          | deaths   populi |
                          |-----------------|
                          |      3        8 |
                          |      6        8 |
                          |-----------------|
                      Sum |      9       16 |
                          +-----------------+

                    Comment

                    Working...
                    X