Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • creating a simple two summary table

    I hvae a simple question that I cannot find the answer to. I have a data set with two category variable and one continous variable. The categories are state_code and ind_code and the variable is revenue. I simply want to create a dataset with the revenue data summed up state_code and ind_code.

    Sample data

    state_code ind_code revenue
    AL 1 100
    AL 1 200
    AL 2 50
    CA 1 300
    CA 2 100
    CA 2 50

    Desired outcome
    1 2
    AL 300 50
    CA 300 150


    It would be nice to have the row and column totals too
    Last edited by Thomas Anderson Economist; 16 Dec 2024, 16:28.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str2 state_code byte ind_code int revenue
    "AL" 1 100
    "AL" 1 200
    "AL" 2  50
    "CA" 1 300
    "CA" 2 100
    "CA" 2  50
    end
    
    table state_code ind_code, stat(sum revenue) nformat(%5.0f)
    Res.:

    Code:
    . table state_code ind_code, stat(sum revenue) nformat(%5.0f)
    
    -------------------------------
               |       ind_code    
               |    1     2   Total
    -----------+-------------------
    state_code |                   
      AL       |  300    50     350
      CA       |  300   150     450
      Total    |  600   200     800
    -------------------------------

    See

    Code:
    help collect export
    to export the table.

    Comment


    • #3
      Andrew, Thank you for your input. That does work, but I was hoping to create a new STATA dataset that I could do further work on in the program. Does anyone know if that is possible.?

      Comment


      • #4
        This is just a simple application of collapse. Column totals reflect spreadsheet thinking, and I do not see how they would be useful in a Stata dataset—unless you can convince me otherwise. Note the correct spelling of Stata: https://www.statalist.org/forums/help#spelling.

        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str2 state_code byte ind_code int revenue
        "AL" 1 100
        "AL" 1 200
        "AL" 2  50
        "CA" 1 300
        "CA" 2 100
        "CA" 2  50
        end
        
        
        separate revenue, by(ind_code) gen(industry)
        collapse (sum) industry? total=revenue, by(state_code)
        Res.:

        Code:
        . l
        
             +----------------------------------------+
             | state_~e   indust~1   indust~2   total |
             |----------------------------------------|
          1. |       AL        300         50     350 |
          2. |       CA        300        150     450 |
             +----------------------------------------+

        Comment


        • #5
          In any case, the trick to get column totals is to duplicate the data.

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str2 state_code byte ind_code int revenue
          "AL" 1 100
          "AL" 1 200
          "AL" 2  50
          "CA" 1 300
          "CA" 2 100
          "CA" 2  50
          end
          
          expand 2, gen(new)
          replace state_code="Total" if new
          separate revenue, by(ind_code) gen(industry)
          collapse (sum) industry? total=revenue, by(state_code)
          Res.:

          Code:
          . l
          
               +----------------------------------------+
               | state_~e   indust~1   indust~2   total |
               |----------------------------------------|
            1. |       AL        300         50     350 |
            2. |       CA        300        150     450 |
            3. |    Total        600        200     800 |
               +----------------------------------------+

          Comment

          Working...
          X