Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Get and sort total variable amount by value

    Hello everyone,

    I am new to Stata (like many of the posters here) and am trying to analyzte a rather large project dataset. The set gives information on certain projects, donors and recipients, as well as project amount. Just as an example, lets say this is the dataset:
    project donor recipient amount
    1 Germany Brazil 100
    2 Netherlands South Africa 150
    3 Switzerland Nicaragua 300
    4 USA Brazil 450
    5 China Nicaragua 10
    6 Russia South Africa 50











    I want to find out who the main recipients by project amount are (this example is small, but lets say I want to focus on the 10 largest recipient countries in my dataset.)
    E.g.:
    Brazil 550
    Nicaragua 310
    South Africa 200

    And I can't, for the life of me, figure out how to do that. sum() wont work, "total" works to calculate one recipient, but I have somewhat over 200 (including unspecified and NGOs etc). Bar graphs are unreadable if don't restrict them to a certain number of countries on the y-axis.

    I suppose this is one of these things thats pretty easy, but I am completely stuck on it.

    Thank you very much in advance for any support.



  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte project str12 donor str13 recipient int amount
    1 "Germany "     "Brazil "       100
    2 "Netherlands " "South Africa " 150
    3 "Switzerland " "Nicaragua "    300
    4 "USA "         "Brazil "       450
    5 "China "       "Nicaragua "     10
    6 "Russia "      "South Africa "  50
    end
    
    collapse (sum) amount, by(recipient)
    gsort -amount
    gen byte top_ten = (_n <= 10)
    As you only have three different recipients in your example, all of them are in the "top ten."

    Note also that "top ten" is inherently ill-defined. If there are ties for total amounts received, and you order the countries by that, it may be that the countries in positions 10 and 11 have the same amount. In that case it is not sensible to include one but not the other (or others--there could be multiple ties at that level). Of course, you may be lucky and not have the problem of ties.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Added: One additional thought. I notice that all of your country names have a blank space on the right. Perhaps that is an artifact of the way you created your tableau, but if that is true in your data, you may run into problems if it isn't true consistently in the full data set. Stata will not recognize "Brazil " and "Brazil" as the same country. As a precaution, before you do anything further with your data, I recommend you assure uniformity along these lines by running:
    Code:
    replace donor = trim(itrim(donor))
    replace recipient = trim(itrim(recipient))
    to strip out any non-essential blanks. You might also have to think about variations in capitalization and fix those as well. See -help upper()-, -help lower()-, and -help proper()-.
    Last edited by Clyde Schechter; 05 Mar 2022, 11:25.

    Comment


    • #3
      Code:
      egen rtotal = total(amount), by(recipient)
      egen tag = tag(country)
      gsort -tag -rtotal
      list country rtotal in 1/10

      Comment


      • #4
        Should be

        Code:
         
         egen tag = tag(recipient)

        Comment


        • #5
          More or less the same idea as #3 but some note of fair warnings from Clyde Schechter about ties and spaces.


          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte project str12 donor str13 recipient int amount
          1 "Germany "     "Brazil "       100
          2 "Netherlands " "South Africa " 150
          3 "Switzerland " "Nicaragua "    300
          4 "USA "         "Brazil "       450
          5 "China "       "Nicaragua "     10
          6 "Russia "      "South Africa "  50
          7 "somewhere"    "elsewhere"     200 
          end
          
          foreach v in donor recipient { 
              replace `v' = trim(`v')
          }
          
          egen rtotal = total(amount), by(recipient)
          egen tag = tag(recipient)
          egen rank = rank(rtotal) if tag, field 
          sort rank 
          list rank recipient rtotal if rank <= 10 
          
               +------------------------------+
               | rank      recipient   rtotal |
               |------------------------------|
            1. |    1         Brazil      550 |
            2. |    2      Nicaragua      310 |
            3. |    3   South Africa      200 |
            4. |    3      elsewhere      200 |
               +------------------------------+

          Comment


          • #6
            I am sorry for replying so late!
            Thank you very much for your help, this worked perfectly on the first try!

            Comment

            Working...
            X