Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate sum of multiple variable without including years when there are missing values


    I want to create sum of some variables but without including sum when there are missing values in one of the variables.

    I have been using the following command:

    gen totsh1= ccdppgusd + ccdppgdm + ccdppgyen + ccdppgffr+ ccdppgswr + ///
    ccdppgpau if year<=2000 & ccdppgusd!=. & ccdppgdm!=. & ccdppgyen!=. ///
    & ccdppgffr!=. & ccdppgswr!=. & ccdppgpau!=.


    but this creates a six when there are missing values for all variables or a sum of five or four variables' sum when the other are missing. I want to create the sum only if none of the variables is missing, otherwise this will be misleading for regression analysis afterwards.

    Below is a short dataset for one country which has many missing data:



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long country int year float id long(ccdppgusd ccdppgdm ccdppgyen ccdppgffr ccdppgswr ccdppgpau ccdppgeur) float totsh1
    2 1970 2    1   1   1   1   1   1   1    6
    2 1971 2    1   1   1   1   1   1   1    6
    2 1972 2    1   1   1   1   1   1   1    6
    2 1973 2    1   1   1   1   1   1   1    6
    2 1974 2    1   1   1   1   1   1   1    6
    2 1975 2    1   1   1   1   1   1   1    6
    2 1976 2    1   1   1   1   1   1   1    6
    2 1977 2    1   1   1   1   1   1   1    6
    2 1978 2    1   1   1   1   1   1   1    6
    2 1979 2    1   1   1   1   1   1   1    6
    2 1980 2    1   1   1   1   1   1   1    6
    2 1981 2    1   1   1   1   1   1   1    6
    2 1982 2    1   1   1   1   1   1   1    6
    2 1983 2    1   1   1   1   1   1   1    6
    2 1984 2    1   1   1   1   1   1   1    6
    2 1985 2    1   1   1   1   1   1   1    6
    2 1986 2    1   1   1   1   1   1   1    6
    2 1987 2    1   1   1   1   1   1   1    6
    2 1988 2    1   1   1   1   1   1   1    6
    2 1989 2    1   1   1   1   1   1   1    6
    2 1990 2    1   1   1   1   1   1   1    6
    2 1991 2    1   1   1   1   1   1   1    6
    2 1992 2    1   1   1   1   1   1   1    6
    2 1993 2 1274 445 353 278 846 687   1 3883
    2 1994 2  994 618 841 278 846 687   1 4264
    2 1995 2  872 628 751 278 739 687   1 3955
    2 1996 2  207 829 833 278 828 687   1 3662
    2 1997 2  374 738 726 278 743 687   1 3546
    2 1998 2  493 732 724 278 626 687   1 3540
    2 1999 2  643 622 814 278 473 687   1 3517
    2 2000 2  798 471 734 278 275 687   1 3243
    2 2001 2  958   1 747   1  59 687 502    .
    2 2002 2 1022   1 735   1  18 687 501    .
    2 2003 2  918   1 731   1 846 687 512    .
    2 2004 2 1226   1 489   1 846 687 394    .
    2 2005 2 1483   1 366   1 846 687 204    .
    2 2006 2 1209   1 286   1 846 687 403    .
    2 2007 2 1863   1  36   1 846 687 665    .
    2 2008 2 1926   1 353   1 846 687 335    .
    2 2009 2 1888   1 353   1 846 687 636    .
    2 2010 2 1699   1  82   1 846 687 620    .
    2 2011 2 1784   1 174   1 846 687 493    .
    2 2012 2 1925   1 113   1 846 687 434    .
    2 2013 2 1934   1  43   1 846 687 430    .
    2 2014 2 1842   1  17   1 846 687 343    .
    2 2015 2 1815   1 353   1 846 687 658    .
    2 2016 2 1832   1 353   1 846 687 605    .
    2 2017 2 1864   1 353   1 846 687 560    .
    2 2018 2 1893   1 353   1 846 687  90    .
    end
    format %ty year
    label values country country
    label def country 2 "Belarus", modify
    label values ccdppgusd ccdppgusd
    label def ccdppgusd 1 ".", modify
    label def ccdppgusd 207 "26.2413", modify
    label def ccdppgusd 374 "33.7801", modify
    label def ccdppgusd 493 "39.1571", modify
    label def ccdppgusd 643 "43.884", modify
    label def ccdppgusd 798 "48.9203", modify
    label def ccdppgusd 872 "50.8244", modify
    label def ccdppgusd 918 "52.2941", modify
    label def ccdppgusd 958 "53.2052", modify
    label def ccdppgusd 994 "54.277", modify
    label def ccdppgusd 1022 "55.4514", modify
    label def ccdppgusd 1209 "61.0813", modify
    label def ccdppgusd 1226 "61.604", modify
    label def ccdppgusd 1274 "63.8919", modify
    label def ccdppgusd 1483 "71.77800000000001", modify
    label def ccdppgusd 1699 "81.0766", modify
    label def ccdppgusd 1784 "85.6798", modify
    label def ccdppgusd 1815 "87.79519999999999", modify
    label def ccdppgusd 1832 "88.9573", modify
    label def ccdppgusd 1842 "89.4439", modify
    label def ccdppgusd 1863 "90.17", modify
    label def ccdppgusd 1864 "90.1717", modify
    label def ccdppgusd 1888 "91.84820000000001", modify
    label def ccdppgusd 1893 "92.0585", modify
    label def ccdppgusd 1925 "95.3184", modify
    label def ccdppgusd 1926 "95.41549999999999", modify
    label def ccdppgusd 1934 "95.78830000000001", modify
    label values ccdppgdm ccdppgdm
    label def ccdppgdm 1 ".", modify
    label def ccdppgdm 445 "17.5736", modify
    label def ccdppgdm 471 "19.3503", modify
    label def ccdppgdm 618 "23.8942", modify
    label def ccdppgdm 622 "24.9344", modify
    label def ccdppgdm 628 "27.532", modify
    label def ccdppgdm 732 "30.6842", modify
    label def ccdppgdm 738 "35.4765", modify
    label def ccdppgdm 829 "42.2266", modify
    label values ccdppgyen ccdppgyen
    label def ccdppgyen 1 ".", modify
    label def ccdppgyen 17 ".0397", modify
    label def ccdppgyen 36 ".0837", modify
    label def ccdppgyen 43 ".0974", modify
    label def ccdppgyen 82 ".1731", modify
    label def ccdppgyen 113 ".2069", modify
    label def ccdppgyen 174 ".3584", modify
    label def ccdppgyen 286 ".66", modify
    label def ccdppgyen 353 "0", modify
    label def ccdppgyen 366 "1.0693", modify
    label def ccdppgyen 489 "1.7044", modify
    label def ccdppgyen 724 "2.1246", modify
    label def ccdppgyen 726 "2.1423", modify
    label def ccdppgyen 731 "2.1712", modify
    label def ccdppgyen 734 "2.1919", modify
    label def ccdppgyen 735 "2.198", modify
    label def ccdppgyen 747 "2.2648", modify
    label def ccdppgyen 751 "2.2871", modify
    label def ccdppgyen 814 "2.699", modify
    label def ccdppgyen 833 "2.8102", modify
    label def ccdppgyen 841 "2.8735", modify
    label values ccdppgffr ccdppgffr
    label def ccdppgffr 1 ".", modify
    label def ccdppgffr 278 "0", modify
    label values ccdppgswr ccdppgswr
    label def ccdppgswr 1 ".", modify
    label def ccdppgswr 18 ".0035", modify
    label def ccdppgswr 59 ".0179", modify
    label def ccdppgswr 275 ".1264", modify
    label def ccdppgswr 473 ".2985", modify
    label def ccdppgswr 626 ".4667", modify
    label def ccdppgswr 739 ".6745", modify
    label def ccdppgswr 743 ".6865", modify
    label def ccdppgswr 828 ".9300000000000001", modify
    label def ccdppgswr 846 "0", modify
    label values ccdppgpau ccdppgpau
    label def ccdppgpau 1 ".", modify
    label def ccdppgpau 687 "0", modify
    label values ccdppgeur ccdppgeur
    label def ccdppgeur 1 ".", modify
    label def ccdppgeur 90 "1.5293", modify
    label def ccdppgeur 204 "14.3772", modify
    label def ccdppgeur 335 "2.8065", modify
    label def ccdppgeur 343 "2.883", modify
    label def ccdppgeur 394 "25.1219", modify
    label def ccdppgeur 403 "26.0298", modify
    label def ccdppgeur 430 "3.0653", modify
    label def ccdppgeur 434 "3.1367", modify
    label def ccdppgeur 493 "3.9485", modify
    label def ccdppgeur 501 "30.5531", modify
    label def ccdppgeur 502 "30.7578", modify
    label def ccdppgeur 512 "32.1096", modify
    label def ccdppgeur 560 "4.4751", modify
    label def ccdppgeur 605 "5.0968", modify
    label def ccdppgeur 620 "5.4669", modify
    label def ccdppgeur 636 "5.9391", modify
    label def ccdppgeur 658 "6.3175", modify
    label def ccdppgeur 665 "6.709", modify


























  • #2
    Code:
    egen wanted = rowtotal(ccdppgusd ccdppgdm ccdppgyen ccdppgffr ccdppgswr ccdppgpau)
    does what you want, i.e. ignore missings, namely treating them as implicit zeros.

    That said, your data look a complete mess, insofar as string variables that "should be" numeric have been pushed through encode. destring should have been used.

    Otherwise put, it seems that to me that the values you want to use have ended up as value labels, and that is my guess at how it happened.

    Comment


    • #3
      If I am right, this is what you need to do


      Code:
      foreach v of var ccd* {
         decode `v', gen(work)
         destring work, replace
         replace `v' = work
         label val `v'
         drop work
      }

      Comment


      • #4
        Thanks Nick. I found the problem. These variables were strings. That's why I was having this problem, otherwise Stata never calculates the sum when there are missing variables.

        Comment


        • #5
          The data you presented in #1 were all numeric variables.

          They were the wrong numeric variables, on my diagnosis, but the problem with the addition not working was not that the variables were string.

          In fact, addition works fine always for string variables and missing string values (not "missing variables") just get added too. Thus

          Code:
          "frog" + "" + "toad"
          is just "frogtoad".

          If what you write in #4 is an agreement that string variables in a previous version of your data were mistakenly encoded, then there you go. But the wording in #4 otherwise makes no sense.

          Comment


          • #6
            Yeah, you are right. The were wrong numeric variables. That's why I got that much labels in dataex.

            Comment

            Working...
            X