Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating cumulative data into actuals

    Hi all

    First of all, apologies if I am not able to explain this very well. I am very new to Stata and still struggling with the lingo.

    Essentially, I have a set of observations from different countries of the number of confirmed cases of a certain disease. The data is cumulative and each variable represents a single day in a series of days. An example below:




    countryre~n | v5 | v6 | v7 | v8 | v9 | v10 | v11 | v12 | v13 | v14 | v15 | v16 | v17 | v18 | v19 | v20 | v21 |
    | Australia | 0 | 0 | 0 | 0 | 4 | 5 | 5 | 6 | 9 | 9 | 12 | 12 | 12 | 13 | 13 | 14 | 15 |
    |-----------------------+-----------------------------------------------------------------------------------------|
    | v22 | v23 | v24 | v25 | v26 | v27 | v28 | v29 | v30 | v31 | v32 | v33 | v34 | v35 | v36 | v37 | v38 | v39 | v40 |
    | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 15 | 19 | 22 | 22 | 22 | 22 | 22 |


    I need to separate this cumulative data into each individual day, i.e essentially I want 40 different variables that are xn = vn - vn-1 if that makes sense.

    Is there a function I can use for this or will I have to separately generate a variable?




  • #2
    As a new user with concerns about posting a clear question, you would particularly benefit from reading the StataList FAQ. In particular, you can learn about how to use the -dataex- command to post example data that will make it possible for people to efficiently and effectively help you.

    Comment


    • #3
      Please do read and act on FAQ Advice #12 to give a data example in useful form.

      I can't get 40 different variables out of 36, and in any case, you would be much better off using the reshape command (not function) here.

      This should help, however.

      Code:
      clear 
      input str42 country  v5  v6  v7  v8  v9  v10  v11  v12  v13  v14  v15  v16  v17  v18  v19  v20  v21   v22  v23  v24  v25  v26  v27  v28  v29  v30  v31  v32  v33  v34  v35  v36  v37  v38  v39  v40 
      Australia  0  0  0  0  4  5  5  6  9  9  12  12  12  13  13  14  15  15  15  15  15  15  15  15  15  15  15  15  15  15  19  22  22  22  22  22 
      end 
      
      reshape long v, i(country) j(t) 
      rename v cumul
      replace t = t -4
      tsset t
      gen actual = cond(t == 1, cumul, D.cumul)
      list 
      
           +---------------------------------+
           |   country    t   cumul   actual |
           |---------------------------------|
        1. | Australia    1       0        0 |
        2. | Australia    2       0        0 |
        3. | Australia    3       0        0 |
        4. | Australia    4       0        0 |
        5. | Australia    5       4        4 |
           |---------------------------------|
        6. | Australia    6       5        1 |
        7. | Australia    7       5        0 |
        8. | Australia    8       6        1 |
        9. | Australia    9       9        3 |
       10. | Australia   10       9        0 |
           |---------------------------------|
       11. | Australia   11      12        3 |
       12. | Australia   12      12        0 |
       13. | Australia   13      12        0 |
       14. | Australia   14      13        1 |
       15. | Australia   15      13        0 |
           |---------------------------------|
       16. | Australia   16      14        1 |
       17. | Australia   17      15        1 |
       18. | Australia   18      15        0 |
       19. | Australia   19      15        0 |
       20. | Australia   20      15        0 |
           |---------------------------------|
       21. | Australia   21      15        0 |
       22. | Australia   22      15        0 |
       23. | Australia   23      15        0 |
       24. | Australia   24      15        0 |
       25. | Australia   25      15        0 |
           |---------------------------------|
       26. | Australia   26      15        0 |
       27. | Australia   27      15        0 |
       28. | Australia   28      15        0 |
       29. | Australia   29      15        0 |
       30. | Australia   30      15        0 |
           |---------------------------------|
       31. | Australia   31      19        4 |
       32. | Australia   32      22        3 |
       33. | Australia   33      22        0 |
       34. | Australia   34      22        0 |
       35. | Australia   35      22        0 |
           |---------------------------------|
       36. | Australia   36      22        0 |
           +---------------------------------+
      .

      Comment


      • #4
        Thank you both. Hopefully this will help:

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str43 provincestate str20 countryregion float(lat long1) int(v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19


        I hit the data width limit but basically the variables go on till v41. Each vn is a particular date and records how many total cases of the disease were recorded TILL this particular day

        I am interested in number of cases reported FOR a particular day - if that makes sense. So essentially subtracting a column from the previous column

        Also, since I don't care about province level data, I ran the following command:

        Code:
        collapse (sum) v*, by(countryregion)

        The observation for Australia above is just one particular entry.

        Here are a few more

        Code:
         countryre~n | v5 | v6 | v7 | v8 | v9 | v10 | v11 | v12 | v13 | v14 | v15 | v16 | v17 | v18 | v19  | v20  | v21  |
             |     Bahrain |  0 |  0 |  0 |  0 |  0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0  |   0  |   0  |
             |-----------------------+-----------------------------------------------------------------------------------------|
             | v22 | v23 | v24 | v25 | v26 | v27 | v28 | v29 | v30 | v31 | v32 | v33 | v34 | v35 | v36 | v37 | v38 | v39 | v40 |
             |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   0 |   1 |  23 |  33 |
             |-----------------------------------------------------------------------------------------------------------------|
             |                                                       v41                                                       |
             |                                                        33    
        
        
        
        
        
        countryre~n | v5 | v6 | v7 | v8 | v9 | v10 | v11 | v12 | v13 | v14 | v15 | v16 | v17 | v18 | v19  | v20  | v21  |
             |      Canada |  0 |  0 |  0 |  0 |  1 |   1 |   2 |   2 |   2 |   4 |   4 |   4 |   4 |   4 |   5  |   5  |   7  |
             |-----------------------+-----------------------------------------------------------------------------------------|
             | v22 | v23 | v24 | v25 | v26 | v27 | v28 | v29 | v30 | v31 | v32 | v33 | v34 | v35 | v36 | v37 | v38 | v39 | v40 |
             |   7 |   7 |   7 |   7 |   7 |   7 |   7 |   7 |   7 |   8 |   8 |   8 |   8 |   9 |   9 |   9 |  10 |  11 |  11 |
             |-----------------------------------------------------------------------------------------------------------------|
             |                                                       v41                                                       |
             |                                                        13                                                       |
             +-----------------------------------------------------------------------------------------------------------------+
        I know I can just generate newer variables using something like this, but I will have to do it for 36 days:

        Code:
        gen jan23 = v6-v5

        Corollary to this, I want to do something else, and maybe this deserves a new thread, but I wanted to make a line series graph for each country. In my understanding, a time series graph will need two variables, I have 37(?). What do I do?

        Comment


        • #5
          Originally posted by Nick Cox View Post
          Please do read and act on FAQ Advice #12 to give a data example in useful form.

          I can't get 40 different variables out of 36, and in any case, you would be much better off using the reshape command (not function) here.

          This should help, however.

          Code:
          clear
          input str42 country v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19 v20 v21 v22 v23 v24 v25 v26 v27 v28 v29 v30 v31 v32 v33 v34 v35 v36 v37 v38 v39 v40
          Australia 0 0 0 0 4 5 5 6 9 9 12 12 12 13 13 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 19 22 22 22 22 22
          end
          
          reshape long v, i(country) j(t)
          rename v cumul
          replace t = t -4
          tsset t
          gen actual = cond(t == 1, cumul, D.cumul)
          list
          
          +---------------------------------+
          | country t cumul actual |
          |---------------------------------|
          1. | Australia 1 0 0 |
          2. | Australia 2 0 0 |
          3. | Australia 3 0 0 |
          4. | Australia 4 0 0 |
          5. | Australia 5 4 4 |
          |---------------------------------|
          6. | Australia 6 5 1 |
          7. | Australia 7 5 0 |
          8. | Australia 8 6 1 |
          9. | Australia 9 9 3 |
          10. | Australia 10 9 0 |
          |---------------------------------|
          11. | Australia 11 12 3 |
          12. | Australia 12 12 0 |
          13. | Australia 13 12 0 |
          14. | Australia 14 13 1 |
          15. | Australia 15 13 0 |
          |---------------------------------|
          16. | Australia 16 14 1 |
          17. | Australia 17 15 1 |
          18. | Australia 18 15 0 |
          19. | Australia 19 15 0 |
          20. | Australia 20 15 0 |
          |---------------------------------|
          21. | Australia 21 15 0 |
          22. | Australia 22 15 0 |
          23. | Australia 23 15 0 |
          24. | Australia 24 15 0 |
          25. | Australia 25 15 0 |
          |---------------------------------|
          26. | Australia 26 15 0 |
          27. | Australia 27 15 0 |
          28. | Australia 28 15 0 |
          29. | Australia 29 15 0 |
          30. | Australia 30 15 0 |
          |---------------------------------|
          31. | Australia 31 19 4 |
          32. | Australia 32 22 3 |
          33. | Australia 33 22 0 |
          34. | Australia 34 22 0 |
          35. | Australia 35 22 0 |
          |---------------------------------|
          36. | Australia 36 22 0 |
          +---------------------------------+
          .
          Hi Nick - thanks for this

          I applied this command. However it generates repeated time values in sample and does not generate the actuals column (as there are more countries and not just Australia) I am also unable to declare the dataset to be a time series data- is there a way around this, or the only possible way is to do this country wise?
          Last edited by Kartikeya Bhatotia; 01 Mar 2020, 15:01.

          Comment


          • #6
            You need to xtset with more than one country. For that you need a numeric identifier.

            Comment


            • #7
              Demonstration of assertions in #6.


              Code:
              clear
              input str42 countryregion v5 v6 v7 v8 v9 v10 v11 v12 v13 v14 v15 v16 v17 v18 v19 v20 v21 v22 v23 v24 v25 v26 v27 v28 v29 v30 v31 v32 v33 v34 v35 v36 v37 v38 v39 v40 v41 
              Australia 0 0 0 0 4 5 5 6 9 9 12 12 12 13 13 14 15 15 15 15 15 15 15 15 15 15 15 15 15 15 19 22 22 22 22 22 . 
              Bahrain 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 23 33 33 
              end
              
              reshape long v, i(country) j(t)
              rename v cumul
              replace t = t -4
              egen id = group(countryregion), label 
              xtset id t
              gen actual = cond(t == 1, cumul, D.cumul)
              list
              
                  +---------------------------------------------+
                   | country~n    t   cumul          id   actual |
                   |---------------------------------------------|
                1. | Australia    1       0   Australia        0 |
                2. | Australia    2       0   Australia        0 |
                3. | Australia    3       0   Australia        0 |
                4. | Australia    4       0   Australia        0 |
                5. | Australia    5       4   Australia        4 |
                   |---------------------------------------------|
                6. | Australia    6       5   Australia        1 |
                7. | Australia    7       5   Australia        0 |
                8. | Australia    8       6   Australia        1 |
                9. | Australia    9       9   Australia        3 |
               10. | Australia   10       9   Australia        0 |
                   |---------------------------------------------|
               11. | Australia   11      12   Australia        3 |
               12. | Australia   12      12   Australia        0 |
               13. | Australia   13      12   Australia        0 |
               14. | Australia   14      13   Australia        1 |
               15. | Australia   15      13   Australia        0 |
                   |---------------------------------------------|
               16. | Australia   16      14   Australia        1 |
               17. | Australia   17      15   Australia        1 |
               18. | Australia   18      15   Australia        0 |
               19. | Australia   19      15   Australia        0 |
               20. | Australia   20      15   Australia        0 |
                   |---------------------------------------------|
               21. | Australia   21      15   Australia        0 |
               22. | Australia   22      15   Australia        0 |
               23. | Australia   23      15   Australia        0 |
               24. | Australia   24      15   Australia        0 |
               25. | Australia   25      15   Australia        0 |
                   |---------------------------------------------|
               26. | Australia   26      15   Australia        0 |
               27. | Australia   27      15   Australia        0 |
               28. | Australia   28      15   Australia        0 |
               29. | Australia   29      15   Australia        0 |
               30. | Australia   30      15   Australia        0 |
                   |---------------------------------------------|
               31. | Australia   31      19   Australia        4 |
               32. | Australia   32      22   Australia        3 |
               33. | Australia   33      22   Australia        0 |
               34. | Australia   34      22   Australia        0 |
               35. | Australia   35      22   Australia        0 |
                   |---------------------------------------------|
               36. | Australia   36      22   Australia        0 |
               37. | Australia   37       .   Australia        . |
               38. |   Bahrain    1       0     Bahrain        0 |
               39. |   Bahrain    2       0     Bahrain        0 |
               40. |   Bahrain    3       0     Bahrain        0 |
                   |---------------------------------------------|
               41. |   Bahrain    4       0     Bahrain        0 |
               42. |   Bahrain    5       0     Bahrain        0 |
               43. |   Bahrain    6       0     Bahrain        0 |
               44. |   Bahrain    7       0     Bahrain        0 |
               45. |   Bahrain    8       0     Bahrain        0 |
                   |---------------------------------------------|
               46. |   Bahrain    9       0     Bahrain        0 |
               47. |   Bahrain   10       0     Bahrain        0 |
               48. |   Bahrain   11       0     Bahrain        0 |
               49. |   Bahrain   12       0     Bahrain        0 |
               50. |   Bahrain   13       0     Bahrain        0 |
                   |---------------------------------------------|
               51. |   Bahrain   14       0     Bahrain        0 |
               52. |   Bahrain   15       0     Bahrain        0 |
               53. |   Bahrain   16       0     Bahrain        0 |
               54. |   Bahrain   17       0     Bahrain        0 |
               55. |   Bahrain   18       0     Bahrain        0 |
                   |---------------------------------------------|
               56. |   Bahrain   19       0     Bahrain        0 |
               57. |   Bahrain   20       0     Bahrain        0 |
               58. |   Bahrain   21       0     Bahrain        0 |
               59. |   Bahrain   22       0     Bahrain        0 |
               60. |   Bahrain   23       0     Bahrain        0 |
                   |---------------------------------------------|
               61. |   Bahrain   24       0     Bahrain        0 |
               62. |   Bahrain   25       0     Bahrain        0 |
               63. |   Bahrain   26       0     Bahrain        0 |
               64. |   Bahrain   27       0     Bahrain        0 |
               65. |   Bahrain   28       0     Bahrain        0 |
                   |---------------------------------------------|
               66. |   Bahrain   29       0     Bahrain        0 |
               67. |   Bahrain   30       0     Bahrain        0 |
               68. |   Bahrain   31       0     Bahrain        0 |
               69. |   Bahrain   32       0     Bahrain        0 |
               70. |   Bahrain   33       0     Bahrain        0 |
                   |---------------------------------------------|
               71. |   Bahrain   34       1     Bahrain        1 |
               72. |   Bahrain   35      23     Bahrain       22 |
               73. |   Bahrain   36      33     Bahrain       10 |
               74. |   Bahrain   37      33     Bahrain        0 |
                   +---------------------------------------------+

              Comment


              • #8
                Thanks Nick. I ran the code and everything looks fine. Except I am still having a bit of a trouble in plotting the time series graph. I want lines to depict cases for a country per day


                I declared the data as time series. Actuals as the time component and ID as the panel variable. Now while plotting the graph I am not sure what to do. t will surely be the x axis, and actuals is the Y axis but how do I get the graph to represent different countryregions?


                Edit: I tried using the following command

                Code:
                xtline t actual
                It gave me separate line graphs for each country. Now even though this is better than no graph, there are just so many countries that the image quality is a bit deteriorated. I was looking for, maybe, all lines on one graph
                Last edited by Kartikeya Bhatotia; 02 Mar 2020, 02:54.

                Comment


                • #9
                  Code:
                  help xtline
                  explains that there is an overlay option.

                  Comment


                  • #10
                    Perfect. Thank you

                    Additionally, is there a way I can exclude some countryregions (let's say countries with just 1 or 2 cases) because the line graphs seem to be grouped right now. Since only a few countries should be the focus here, any way of dropping others?

                    i.e as you can see, China seems to have the highest number of cases. I don't want Finland and Spain with it as both countries have negligible cases in front of China

                    Image attached
                    Attached Files

                    Comment


                    • #11
                      I gave you the code for what you asked for -- in a rush before my journey to work -- but I feared that the result would be useless..

                      You have I think 51 countries superimposed and the results for one dominate. You won't be much better off with logarithmic scale even with some fudge for zeros. I agree that 51 panels is too many too from several points of view here.

                      More constructive advice would depend on knowing what you most want to do here, but no single graph can do all that you want. Otherwise see


                      Code:
                      help if
                      for how to select countries.

                      Comment


                      • #12
                        Let's say I want to display the top 10 countries by highest number of cumulative cases, how do I go about that? I am unsure where exactly to use the if command

                        Apologies, I am aware I am not the brightest at this

                        P.S: Is it also possible to do top 9, and group the rest as 'Others' - just an idea. Not sure if stata allows this

                        Comment


                        • #13
                          if here is a qualifier not a command. But otherwise I don't think this is trivial, so be reassured.

                          Code:
                          bysort countryregion (cumul) : gen largest = cond(_n == _N, cumul, .) 
                          
                          * note the MINUS sign 
                          egen rank = rank(-largest)  
                          bysort countryregion (rank) : replace rank = rank[1] 
                          
                          xtline cumul if rank <= 10
                          For 9 not 10, change accordingly

                          Comment


                          • #14
                            Thank you very much, Nick!

                            This is possibly the last time I bother: I was wondering if there is any way I can make country-specific line graphs for actuals

                            By this I mean, is there a way I can select a country arbitrarily to view its graph? I need this because the lines converge after Day 35, and a closer look might help

                            Comment


                            • #15
                              Again, use the if qualifier. A condition such as

                              Code:
                              ... if countryregion == "Australia"
                              or

                              Code:
                              .. if inlist(countryregion, "Mercury", "Venus", "Earth", "Mars")
                              will select countries for graphing. See
                              Code:
                              help inlist()
                              for limits on its use with string values.

                              Comment

                              Working...
                              X