Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating new values for an existing variable which is the sum of the current values within that variable

    Dear all,

    I am currently wondering how to achieve the following. I have a country sector year dataset, for which I have data for 10 sectors (a till j), per country and per year. An example of the data:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str8 country int year str4 sector double output
    "ARG" 1950 "a"  5578.795678457762
    "ARG" 1950 "c"  3162.352961932677
    "ARG" 1950 "b" 2192.9547535578636
    "ARG" 1950 "f"  8238.906783430726
    "ARG" 1950 "d" 13936.542546640336
    "ARG" 1950 "e"  687.2856438659917
    "ARG" 1950 "g" 2419.9959857454223
    "ARG" 1950 "j" 197.88615758071182
    "ARG" 1950 "i"  2603.393116487713
    "ARG" 1950 "h"   8414.82433396344
    "ARG" 1951 "a"  5941.018097253331
    "ARG" 1951 "c" 3446.8047678333733
    "ARG" 1951 "b" 2223.3408807702936
    "ARG" 1951 "f"  8238.906783430726
    "ARG" 1951 "d"  14284.57365853503
    "ARG" 1951 "e"  763.4055626890308
    "ARG" 1951 "g" 2453.5280528148082
    "ARG" 1951 "j" 200.62811746295333
    "ARG" 1951 "i" 2639.4663798750294
    "ARG" 1951 "h"  8531.422235615182
    "ARG" 1952 "a" 5257.0737190865775
    "ARG" 1952 "c"  2643.523995668816
    "ARG" 1952 "b"  2171.336429571928
    "ARG" 1952 "f"  8046.196871310385
    "ARG" 1952 "d" 13239.524193422663
    "ARG" 1952 "e"  750.1673159371979
    "ARG" 1952 "g" 2396.1394710683044
    "ARG" 1952 "j" 195.93537995524042
    "ARG" 1952 "i" 2577.7286581747103
    "ARG" 1952 "h"  8331.870320233235
    "ARG" 1953 "a"  6740.777451651082
    "ARG" 1953 "c" 2510.6175406379525
    "ARG" 1953 "b" 2204.7565084567063
    "ARG" 1953 "f"  8392.315057867974
    "ARG" 1953 "d" 12960.797368297975
    "ARG" 1953 "e"  790.2897169516673
    "ARG" 1953 "g"  2499.212694571243
    "ARG" 1953 "j"  204.3638088735519
    "ARG" 1953 "i"  2683.025139817849
    "ARG" 1953 "h"  8672.215153443365
    "ARG" 1954 "a"  6511.810826255326
    "ARG" 1954 "c"  2736.996667338873
    "ARG" 1954 "b"  2261.909686839372
    "ARG" 1954 "f"  8642.445002404527
    "ARG" 1954 "d" 14075.704668796727
    "ARG" 1954 "e"  818.2538146284186
    "ARG" 1954 "g"  2573.700833823366
    "ARG" 1954 "j" 210.45479900277695
    "ARG" 1954 "i"   2765.84657411867
    "ARG" 1954 "h"  8939.914955027132
    "ARG" 1955 "a"  6762.147670021353
    "ARG" 1955 "c"  2858.949293658401
    "ARG" 1955 "b"  2347.155105444025
    "ARG" 1955 "f"  8752.446126472425
    "ARG" 1955 "d" 15357.441163895655
    "ARG" 1955 "e"   851.081233640257
    "ARG" 1955 "g"    2606.4589230743
    "ARG" 1955 "j" 213.13346973187592
    "ARG" 1955 "i"  2956.088116583164
    "ARG" 1955 "h"  9554.823687297538
    "ARG" 1956 "a" 6631.4840491288405
    "ARG" 1956 "c" 2722.3915623904268
    "ARG" 1956 "b" 2394.6213044397973
    "ARG" 1956 "f"   9066.33468432222
    "ARG" 1956 "d" 15121.438888607598
    "ARG" 1956 "e"  880.2611616507802
    "ARG" 1956 "g" 2699.9342350769643
    "ARG" 1956 "j"  220.7770651881584
    "ARG" 1956 "i" 2948.1245171311616
    "ARG" 1956 "h"  9529.083321760638
    "ARG" 1957 "a"  6745.662072992858
    "ARG" 1957 "c"   3229.18870410152
    "ARG" 1957 "b"  2420.729413626463
    "ARG" 1957 "f"  8881.543149355144
    "ARG" 1957 "d" 15877.256520241333
    "ARG" 1957 "e"  920.3835626652493
    "ARG" 1957 "g" 2727.0119012555233
    "ARG" 1957 "j" 223.04791500152234
    "ARG" 1957 "i"  3045.111037974143
    "ARG" 1957 "h" 10416.063582964312
    "ARG" 1958 "a"  6843.354499828381
    "ARG" 1958 "c" 3301.4839735963305
    "ARG" 1958 "b"  2437.379461120535
    "ARG" 1958 "f"  8882.986833222076
    "ARG" 1958 "d"  16717.50600036174
    "ARG" 1958 "e"  978.7434186862957
    "ARG" 1958 "g"  2741.333972622695
    "ARG" 1958 "j" 209.92744941319748
    "ARG" 1958 "i"  3142.097558817124
    "ARG" 1958 "h" 10759.994704655532
    "ARG" 1959 "a"   6754.82073800869
    "ARG" 1959 "c"  2776.430450699679
    "ARG" 1959 "b"  2453.590940306762
    "ARG" 1959 "f"   9352.18408997442
    "ARG" 1959 "d" 14553.812726492359
    "ARG" 1959 "e" 1123.4272284051392
    "ARG" 1959 "g" 2750.7400376980886
    "ARG" 1959 "j" 201.34868345160044
    "ARG" 1959 "i" 3187.2693630453614
    "ARG" 1959 "h"  9643.409456394837
    end
    What I would like to do: Generate a new set of values within the variable sector, which would be the total output of the 10 sectors a till j. I would like to basically generate a new "group" for the sector variable, which would be then letter k, depicting the total output of the sectors per country and per year.

    I have seen examples where new variables are generated or values within an existing variable are replaced, which unfortunately cannot be applied to my case as far as I am aware. Any ideas on what I could do to achieve this would be greatly appreciated.

    Best,

    Satya

  • #2
    Well generating the total is quite easy:
    Code:
    by country year, sort: egen output_all_sectors = total(output)
    As for turning that into a new observation that you call sector k, that is such a horrendously bad way to organize data in Stata that I decline to show you how it can be done. You will get yourself in endless trouble with data analysis going forward if you do that. There are really good reasons why you haven't seen it done before! Stata is not a spreadsheet and trying to work with it as if it were usually ends badly.
    Last edited by Clyde Schechter; 23 Nov 2019, 13:57.

    Comment


    • #3
      Hi Clyde,

      Thank you for your quick reply and nice explanation. That indeed explains why I have not seen it been done before, and I am grateful for you explaining me why this is the case. If I understand your advice correctly, does this mean that it is crucial to generate the total of a group of sectors as a new variable rather than estimating it as a new group within an existing group identifier? E.g. for the variable sector I would have 5 different sectors, and I'd like to generate a 6th "sector" which is just the total of the 5 sectors --> This is a very bad idea if I understand correctly?

      Thank you for your time.

      Best,

      Satya

      Comment


      • #4
        Yes, you have understood perfectly!

        Comment


        • #5
          Hi Clyde,

          Thank you very much for your insight on this matter.

          Best,

          Satya

          Comment


          • #6
            Hi Satya, also you can get the totals by sector, country, year, etc in a table.

            So with the data you provided using dataex in post#1:

            Code:
            format output %10.1gc
            
            . table sector year if year<=1955, c(sum output) format(%9.1gc) row col
            
            -------------------------------------------------------------------------
                      |                             year                            
               sector |    1950     1951     1952     1953     1954     1955    Total
            ----------+--------------------------------------------------------------
                    a |   5,579    5,941    5,257    6,741    6,512    6,762   36,792
                    b |   2,193    2,223    2,171    2,205    2,262    2,347   13,401
                    c |   3,162    3,447    2,644    2,511    2,737    2,859   17,359
                    d |  13,937   14,285   13,240   12,961   14,076   15,357   83,855
                    e |     687      763      750      790      818      851    4,660
                    f |   8,239    8,239    8,046    8,392    8,642    8,752   50,311
                    g |   2,420    2,454    2,396    2,499    2,574    2,606   14,949
                    h |   8,415    8,531    8,332    8,672    8,940    9,555   52,445
                    i |   2,603    2,639    2,578    2,683    2,766    2,956   16,226
                    j |     198      201      196      204      210      213    1,222
                      |
                Total |  47,433   48,723   45,609   47,658   49,537   52,260  291,221
            -------------------------------------------------------------------------
            
            
            
            . table year, c(sum output) format(%9.1fc) row col
            
            -----------------------
                 year | sum(output)
            ----------+------------
                 1950 |    47,432.9
                 1951 |    48,723.1
                 1952 |    45,609.5
                 1953 |    47,658.4
                 1954 |    49,537.0
                 1955 |    52,259.7
                 1956 |    52,214.5
                 1957 |    54,486.0
                 1958 |    56,014.8
                 1959 |    52,797.0
                      |
                Total |   506,733.0
            -----------------------

            I also created a small toy dataset with a few countries and a few years:
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str9 country int year str1 sector long gdp
            "Argentina" 1950 "a" 54555
            "Argentina" 1950 "b" 76575
            "Argentina" 1950 "c" 83363
            "Argentina" 1950 "d" 61883
            "Argentina" 1951 "a" 55429
            "Argentina" 1951 "b" 81982
            "Argentina" 1951 "c" 74722
            "Argentina" 1951 "d" 52778
            "Brazil"    1950 "a" 70239
            "Brazil"    1950 "b" 64684
            "Brazil"    1950 "c" 73585
            "Brazil"    1950 "d" 54478
            "Brazil"    1951 "a" 59976
            "Brazil"    1951 "b" 60459
            "Brazil"    1951 "c" 77584
            "Brazil"    1951 "d" 72047
            "Canada"    1950 "a" 58784
            "Canada"    1950 "b" 74998
            "Canada"    1950 "c" 75177
            "Canada"    1950 "d" 64467
            "Canada"    1951 "a" 58883
            "Canada"    1951 "b" 76851
            "Canada"    1951 "c" 65824
            "Canada"    1951 "d" 73421
            end
            
            . list, sepby(country year) noobs
            
              +------------------------------------+
              |   country   year   sector      gdp |
              |------------------------------------|
              | Argentina   1950        a   54,555 |
              | Argentina   1950        b   76,575 |
              | Argentina   1950        c   83,363 |
              | Argentina   1950        d   61,883 |
              |------------------------------------|
              | Argentina   1951        a   55,429 |
              | Argentina   1951        b   81,982 |
              | Argentina   1951        c   74,722 |
              | Argentina   1951        d   52,778 |
              |------------------------------------|
              |    Brazil   1950        a   70,239 |
              |    Brazil   1950        b   64,684 |
              |    Brazil   1950        c   73,585 |
              |    Brazil   1950        d   54,478 |
              |------------------------------------|
              |    Brazil   1951        a   59,976 |
              |    Brazil   1951        b   60,459 |
              |    Brazil   1951        c   77,584 |
              |    Brazil   1951        d   72,047 |
              |------------------------------------|
              |    Canada   1950        a   58,784 |
              |    Canada   1950        b   74,998 |
              |    Canada   1950        c   75,177 |
              |    Canada   1950        d   64,467 |
              |------------------------------------|
              |    Canada   1951        a   58,883 |
              |    Canada   1951        b   76,851 |
              |    Canada   1951        c   65,824 |
              |    Canada   1951        d   73,421 |
              +------------------------------------+
            
            
            egen country_gdp = total(gdp), by(country year)  // this is the same as Clyde's answer
            egen sector_gdp  = total(gdp), by(sector year)  // this creates the Sector total 
            format country_gdp sector_gdp %10.1gc
            sort sector year country
            
            . list, sepby(sector year) noobs abbrev(12)
            
              +---------------------------------------------------------------+
              |   country   year   sector      gdp   country_gdp   sector_gdp |
              |---------------------------------------------------------------|
              | Argentina   1950        a   54,555       276,376      183,578 |
              |    Brazil   1950        a   70,239       262,986      183,578 |
              |    Canada   1950        a   58,784       273,426      183,578 |
              |---------------------------------------------------------------|
              | Argentina   1951        a   55,429       264,911      174,288 |
              |    Brazil   1951        a   59,976       270,066      174,288 |
              |    Canada   1951        a   58,883       274,979      174,288 |
              |---------------------------------------------------------------|
              | Argentina   1950        b   76,575       276,376      216,257 |
              |    Brazil   1950        b   64,684       262,986      216,257 |
              |    Canada   1950        b   74,998       273,426      216,257 |
              |---------------------------------------------------------------|
              | Argentina   1951        b   81,982       264,911      219,292 |
              |    Brazil   1951        b   60,459       270,066      219,292 |
              |    Canada   1951        b   76,851       274,979      219,292 |
              |---------------------------------------------------------------|
              | Argentina   1950        c   83,363       276,376      232,125 |
              |    Brazil   1950        c   73,585       262,986      232,125 |
              |    Canada   1950        c   75,177       273,426      232,125 |
              |---------------------------------------------------------------|
              | Argentina   1951        c   74,722       264,911      218,130 |
              |    Brazil   1951        c   77,584       270,066      218,130 |
              |    Canada   1951        c   65,824       274,979      218,130 |
              |---------------------------------------------------------------|
              | Argentina   1950        d   61,883       276,376      180,828 |
              |    Brazil   1950        d   54,478       262,986      180,828 |
              |    Canada   1950        d   64,467       273,426      180,828 |
              |---------------------------------------------------------------|
              | Argentina   1951        d   52,778       264,911      198,246 |
              |    Brazil   1951        d   72,047       270,066      198,246 |
              |    Canada   1951        d   73,421       274,979      198,246 |
              +---------------------------------------------------------------+
            
            
            sort country year sector
            list, sepby(country year) noobs abbrev(12)
            
              +---------------------------------------------------------------+
              |   country   year   sector      gdp   country_gdp   sector_gdp |
              |---------------------------------------------------------------|
              | Argentina   1950        a   54,555       276,376      183,578 |
              | Argentina   1950        b   76,575       276,376      216,257 |
              | Argentina   1950        c   83,363       276,376      232,125 |
              | Argentina   1950        d   61,883       276,376      180,828 |
              |---------------------------------------------------------------|
              | Argentina   1951        a   55,429       264,911      174,288 |
              | Argentina   1951        b   81,982       264,911      219,292 |
              | Argentina   1951        c   74,722       264,911      218,130 |
              | Argentina   1951        d   52,778       264,911      198,246 |
              |---------------------------------------------------------------|
              |    Brazil   1950        a   70,239       262,986      183,578 |
              |    Brazil   1950        b   64,684       262,986      216,257 |
              |    Brazil   1950        c   73,585       262,986      232,125 |
              |    Brazil   1950        d   54,478       262,986      180,828 |
              |---------------------------------------------------------------|
              |    Brazil   1951        a   59,976       270,066      174,288 |
              |    Brazil   1951        b   60,459       270,066      219,292 |
              |    Brazil   1951        c   77,584       270,066      218,130 |
              |    Brazil   1951        d   72,047       270,066      198,246 |
              |---------------------------------------------------------------|
              |    Canada   1950        a   58,784       273,426      183,578 |
              |    Canada   1950        b   74,998       273,426      216,257 |
              |    Canada   1950        c   75,177       273,426      232,125 |
              |    Canada   1950        d   64,467       273,426      180,828 |
              |---------------------------------------------------------------|
              |    Canada   1951        a   58,883       274,979      174,288 |
              |    Canada   1951        b   76,851       274,979      219,292 |
              |    Canada   1951        c   65,824       274,979      218,130 |
              |    Canada   1951        d   73,421       274,979      198,246 |
              +---------------------------------------------------------------+
            
            
            *** You can create the various totals using the "Table" command
            . table country year, c(sum gdp) format(%10.1gc) row col
            
            ----------------------------------------------
                      |                year              
              country |       1950        1951       Total
            ----------+-----------------------------------
            Argentina |    276,376     264,911     541,287
               Brazil |    262,986     270,066     533,052
               Canada |    273,426     274,979     548,405
                      |
                Total |    812,788     809,956   1,622,744
            ----------------------------------------------
            
            . table sector year, c(sum gdp) format(%10.1gc) row col
            
            ----------------------------------------------
                      |                year              
               sector |       1950        1951       Total
            ----------+-----------------------------------
                    a |    183,578     174,288     357,866
                    b |    216,257     219,292     435,549
                    c |    232,125     218,130     450,255
                    d |    180,828     198,246     379,074
                      |
                Total |    812,788     809,956   1,622,744
            ----------------------------------------------
            
            
            *** Summarizing the data in various ways
            . table sector year, by(country) c(sum gdp) format(%10.1gc) row col
            
            ----------------------------------------------
            country   |
            and       |                year              
            sector    |       1950        1951       Total
            ----------+-----------------------------------
            Argentina |
                    a |     54,555      55,429     109,984
                    b |     76,575      81,982     158,557
                    c |     83,363      74,722     158,085
                    d |     61,883      52,778     114,661
                      |
                Total |    276,376     264,911     541,287
            ----------+-----------------------------------
            Brazil    |
                    a |     70,239      59,976     130,215
                    b |     64,684      60,459     125,143
                    c |     73,585      77,584     151,169
                    d |     54,478      72,047     126,525
                      |
                Total |    262,986     270,066     533,052
            ----------+-----------------------------------
            Canada    |
                    a |     58,784      58,883     117,667
                    b |     74,998      76,851     151,849
                    c |     75,177      65,824     141,001
                    d |     64,467      73,421     137,888
                      |
                Total |    273,426     274,979     548,405
            ----------------------------------------------
            
            
            . table sector country , by(year) c(sum gdp) format(%10.1gc) row col
            
            ----------------------------------------------------------
            year and  |                    country                    
            sector    |  Argentina      Brazil      Canada       Total
            ----------+-----------------------------------------------
            1950      |
                    a |     54,555      70,239      58,784     183,578
                    b |     76,575      64,684      74,998     216,257
                    c |     83,363      73,585      75,177     232,125
                    d |     61,883      54,478      64,467     180,828
                      |
                Total |    276,376     262,986     273,426     812,788
            ----------+-----------------------------------------------
            1951      |
                    a |     55,429      59,976      58,883     174,288
                    b |     81,982      60,459      76,851     219,292
                    c |     74,722      77,584      65,824     218,130
                    d |     52,778      72,047      73,421     198,246
                      |
                Total |    264,911     270,066     274,979     809,956
            ----------------------------------------------------------
            
            
            . table sector year country, c(sum gdp) format(%10.1gc) row col
            
            --------------------------------------------------------------------------------------------------------------------------
                      |                                                country and year                                              
                      | ------------ Argentina -----------    ------------- Brazil -------------    ------------- Canada -------------
               sector |       1950        1951       Total          1950        1951       Total          1950        1951       Total
            ----------+---------------------------------------------------------------------------------------------------------------
                    a |     54,555      55,429     109,984        70,239      59,976     130,215        58,784      58,883     117,667
                    b |     76,575      81,982     158,557        64,684      60,459     125,143        74,998      76,851     151,849
                    c |     83,363      74,722     158,085        73,585      77,584     151,169        75,177      65,824     141,001
                    d |     61,883      52,778     114,661        54,478      72,047     126,525        64,467      73,421     137,888
                      |
                Total |    276,376     264,911     541,287       262,986     270,066     533,052       273,426     274,979     548,405
            --------------------------------------------------------------------------------------------------------------------------
            Last edited by David Benson; 23 Nov 2019, 23:43.

            Comment


            • #7
              Hi David,

              Thank you for your reply. This is also a nice idea to try out. Thank you.

              Best,

              Satya

              Comment

              Working...
              X