Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • duplicate data

    Hello,

    so i have to summarize some investmentfunds that kinda look like this:

    fundid fundnr isin cusip calmt quarter month year
    fs00008l08 1 US74316J7303 74316J730 631 210 8 2012
    fs00008l08 1 US74316J7303 74316J730 639 213 4 2013
    fs00008l08 1 US74316J7303 74316J730 661 220 2 2015
    fs00008l08 1 US74316J7303 74316J730 632 210 9 2012

    fs00008l0w 2 US00766Y4897 00766Y489 680 226 9 2016
    fs00008l0w 2 US00766Y4897 00766Y489 701 233 6 2018
    fs00008l0w 2 US00766Y4897 00766Y489 621 207 10 2011
    fs00008l0w 2 US00766Y4897 00766Y489 668 222 9 2015 [.....]

    mf_max_age
    3.753593
    3.753593
    3.753593
    3.753593

    6.885695
    6.885695
    6.885695
    6.885695



    (simplified)
    so following problem:

    i am trying to find out the average max age for every fundnr, the following problem is that the fundnr are duplicates because they represent different months/years/returns within the fund but they have the same max age. So if I try 'mean(mf_max_age)' this will simply not work because of the unnecessary duplicates of the fundnr. I know I can delete them, but that would mess up the whole database, if i had only one fund of each fundnr left (i woud lose information on different years within the fundnrs and so on)
    How can I avoid this problem? Maybe something like: mean(mf_max_age) for each fundnr only once?
    Thank you for any recommendations!

  • #2
    duplicates tag
    then use if to exclude duplicates.

    Comment

    Working...
    X