Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding a new row for an existing variable

    Hello everyone!

    I am struggling with this data management task in the table below:
    In the part of the dataset below, you have the number of infant death caused by two disease, disaggregated at regional level.

    Click image for larger version

Name:	Schermata 2019-07-04 alle 16.38.31.png
Views:	1
Size:	288.9 KB
ID:	1506082


    So, as you can see, the mdreg_1 variable tells me which part of the world the observations are from.
    These have been sorted by year, as you can see.
    Now I want to add another label for this mdreg_1 var which takes the value World and this must be equal to the sum of all death by disease.
    I struggle to do this, in the sense that I can generate with egen the count of death by disease by year but this will jsut be another variable while I need World to just be an additional row for each year.

    The reason why I need World to be on another row is just as I was asked to do so in a test I was given sometime ago.

    Thanks very much for your support!

    Maria
    Last edited by Maria Centofanti; 04 Jul 2019, 09:46.

  • #2
    Hi Maria, I think this should work.
    Code:
    bysort year: egen diarrhoea_world = sum(diarrhoea)
    bysort year: egen pnoemonia_world = sum(pnoemonia)
    bysort year (mdreg_1): gen last = _n == _N
    expand 2 if last
    bysort year (mdreg_1): replace mdreg_1 = mdreg_1[_n-1]+1 if _n == _N
    bysort year (mdreg_1): replace pnoemonia = pnoemonia_world if _n == _N
    bysort year (mdreg_1): replace diarrhoea = diarrhoea_world if _n == _N
    label define mdreg_1 12 "World", modify
    Note, I'm assuming here that you used encode on your region variable and that it has values from 1 to 11. If that's not the case you may have to amend the list line a bit.

    Comment


    • #3
      I have to say that this is (a) spreadsheet thinking (b) a bad idea in Stata practice. (a) doesn't imply (b), which is why I flag points separately.

      Having totals in the data set is fine, but doing it your way commits you ever after to exclude the observations with totals from calculations so that you are not double counting in messy ways.

      The way to include totals is like this. I can't use your data because you are showing a screenshot, contrary to explicit advice in the FAQ, but no matter. Here is a silly example to show the point.


      Code:
      . sysuse auto, clear
      (1978 Automobile Data)
      
      . egen total_price = total(price), by(foreign)
      
      . tabdisp foreign, c(total_price)
      
      -----------------------
       Car type | total_price
      ----------+------------
       Domestic |      315766
        Foreign |      140463
      -----------------------
      
      . egen tag = tag(foreign)
      
      . list foreign total_price if tag , noobs
      
        +---------------------+
        |  foreign   total_~e |
        |---------------------|
        | Domestic     315766 |
        |  Foreign     140463 |
        +---------------------+
      I calculated the total price for groups of cars. If you look at the data you will see that the same total is repeated for all observations in the same group, but this isn't a problem. Stata offers various ways to get round this and I have shown two. tabdisp is smart in this situation and (more generally) you can tag just one observation in each group for further display or calculations.

      This way of thinking takes a little while to get used to but I will invoke 28 years of Stata experience to say that it beats adding extra observations hands down.

      Comment


      • #4
        Hi!

        I have a similar issue, i.e I'm trying to add a "total" row to a data frame such that it corresponds to the sum of all my groups' observations.
        I would like to use your method Nick Cox, but for now I don't see how using either tabdisp or tag will help me to generate this new total row.

        In short, considering this code :

        sysuse auto, clear
        collapse(sum) price, by(foreign)

        How do I add a third row that shows "total" for the foreign variable and the sum of the two figures above for the price variable ?

        Many thanks !
        Baptiste
        Last edited by Baptiste Roux; 02 Nov 2020, 11:57.

        Comment

        Working...
        X