Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a new row that is the sum of selected rows

    Hi everyone,

    I have a dataset summarising the population of each district in a country, with 5 different sub-populations, and 1 total.
    That is, for district "Paris", I have 6 observations: Total, Refugees, IDPs, Other Affected Populations, Unknown, Disabled. (see bottom of my message)

    For each district, I want to check that Total = Refugees + IDPs + OAP + Unknown.
    (Notice one difficulty: Disabled is not included in the total !! i.e. Total ≠ Refugees + IDPs + OAP + Unknown + Disabled)

    My idea would be to create a new row for each district, checking that the numbers match. If the total is correct, then Number = 0. Otherwise, number = 1.
    In the case of the district of Paris, the row would look like this:
    Paris checktotal 0
    If feel like the solution probably includes a "bysort", but I can't quite get to it... My first "draft" is: bysort(District): egen checktotal = 0 if sum(Number)==TOTAL

    Could you please help me ?

    Many thanks !!
    District Population Number
    Paris Refugees 3
    Paris IDPs 3
    Paris OtherAffected 3
    Paris Unknown 3
    Paris TOTAL 12
    Paris Disabled 4
    Lyon Refugees 5
    Lyon IDPs 5
    Lyon OtherAffected 5
    Lyon unknown 5
    Lyon Total 20
    Lyon Disabled 11

  • #2
    I can help you, but please read section 12 of the FAQ to reformat this question so I can read it, at present I've not the foggiest idea what's going on here.

    Comment


    • #3
      One, somewhat silly suggestion,
      Code:
      by district (pop), sort: egen wanted = max(num[2]+num[3]+num[4]+num[6]==num[5])

      Comment


      • #4
        Hi Jared,

        I am not sure what you want me to do - here's an attempt to make it clearer:

        When I try:
        Code:
        bysort(GMGRN): egen checktotal = 0 if sum(`v')==TOTAL


        The following error message appears:
        Code:
        Invalid Syntax


        Does this help ?

        Many thanks fory our time !

        Comment


        • #5
          No no, I need to see your real dataset just as you're seeing it. So, go to the Stata terminal and type in dataex.

          This shows all of us a small sample of your data that you're using such that we can make intelligible suggestions instead of us just guessing.

          Copy from the first [CODE] to the very last CODE bracket, paste that exact screen here, and that will allow us to see your real data, not a table that I can't actually use on my machine.

          Comment


          • #6
            Stata style is very much that totals belong in a new variable, not dedicated observations. Otherwise you need to keep excluding observations from analysis.

            I will post some code shortly.

            Comment


            • #7
              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input str5 district str13 population byte number
              "Paris" "Refugees"       3
              "Paris" "IDPs"           3
              "Paris" "OtherAffected"  3
              "Paris" "Unknown"        3
              "Paris" "TOTAL"         12
              "Paris" "Disabled"       4
              "Lyon"  "Refugees"       5
              "Lyon"  "IDPs"           5
              "Lyon"  "OtherAffected"  5
              "Lyon"  "unknown"        5
              "Lyon"  "Total"         20
              "Lyon"  "Disabled"      11
              end
              
              . egen total = total(cond(lower(pop) == "total", number, .)) , by(district)
              
              . drop if lower(pop) == "total"
              
              . egen check = total(number), by(district)
              
              . list, sepby(district)
              
                   +---------------------------------------------------+
                   | district      population   number   total   check |
                   |---------------------------------------------------|
                1. |    Paris        Refugees        3      12      16 |
                2. |    Paris            IDPs        3      12      16 |
                3. |    Paris   OtherAffected        3      12      16 |
                4. |    Paris         Unknown        3      12      16 |
                5. |    Paris        Disabled        4      12      16 |
                   |---------------------------------------------------|
                6. |     Lyon        Refugees        5      20      31 |
                7. |     Lyon            IDPs        5      20      31 |
                8. |     Lyon   OtherAffected        5      20      31 |
                9. |     Lyon         unknown        5      20      31 |
               10. |     Lyon        Disabled       11      20      31 |
                   +---------------------------------------------------+
              .

              Comment


              • #8
                Hi Nick, many thanks for your help !

                The code is helpful but does not quite do the trick, although it put me in the right direction to solve my problem.
                My idea was to check that the value of "total" for each district was correct. The code you suggested returned the value of the reportedtotal in any case, even if this reported total was wrong.

                I have slightly modified the example, to help. In my new example below, the reported total for the district of Paris is wrong (the correct total is 12, but the reported total is 11). However, my code spots this issue !

                Man ythanks for your help, which was instrumental in helping me find the right code

                * Example generated by -dataex-. For more info, type help dataex . input str5 district str13 population byte reportednumber

                district population reportednumber
                1. "Paris" "Disabled" 2
                2. "Paris" "IDPs" 3
                3. "Paris" "OtherAffected" 3
                4. "Paris" "Refugees" 3
                5. "Paris" "total" 11
                6. "Paris" "Unknown" 3
                7. "Lyon" "Disabled" 11
                8. "Lyon" "IDPs" 5
                9. "Lyon" "OtherAffected" 5
                10. "Lyon" "Refugees" 5
                11. "Lyon" "total" 20
                12. "Lyon" "Unknown" 5
                13. end

                .
                . egen Correct_Total = total(cond(pop =="IDPs" | pop =="Refugees" | pop =="OtherAffected" | pop =="Unknown", reportednumber,.)) , by(district)

                .
                . egen Reported_Total = total(cond(pop=="total", reportednumber,.)) , by(district)

                .
                . drop if population == "Disabled" | population == "IDPs" | population == "Refugees" | population == "OtherAffected" | population == "Unknown"
                (10 observations deleted)

                .
                . gen check = 0 if Correct_Total==Reported_Total
                (1 missing value generated)

                . replace check = 1 if check==.
                (1 real change made)
                Last edited by remi de las; 17 Mar 2022, 04:51.

                Comment

                Working...
                X