Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Collapsing time variable

    Hello all,

    I am quite new to stata and I need help with a simple question (I think), that I just can't figure out. I am attempting to add the values in the count variable, so that I end up with unique values on year. Count refers to the number of events I am analyzing. So 1 menas that it happened once, 0 that the event did not occur that year. My dataset looks like this:

    count year
    1 1990
    1 1992
    1 1992
    1 1993
    1 1994
    1 1995
    1 1995
    1 1995
    1 1995
    1 1996
    1 1996
    1 1996
    1 1997
    1 1998
    1 2000
    1 2000
    1 2001
    1 2001
    1 2007
    0 2008
    1 2008
    0 2009
    1 2011
    1 2014
    1 2014

    I wish to sum the count for each year, so that the dataset will look like this:

    count year
    1 1990
    2 1992
    1 1993
    1 1994
    4 1995
    3 1996
    1 1997
    1 1998
    2 2000
    2 2001
    1 2007
    1 2008
    0 2009
    1 2011
    2 2014

    I then wish to "fill in the gaps", so that my dataset looks like this:

    1 1990
    0 1991
    2 1992
    1 1993
    1 1994
    4 1995
    3 1996
    1 1997
    1 1998
    0 1999
    2 2000
    1 2001
    0 2002
    0 2003
    0 2004
    0 2005
    0 2006
    1 2007
    1 2008
    0 2009
    0 2010
    1 2011
    0 2012
    0 2013
    2 2014

    I need to expand the dataset as I am going to include more variables where there will be values for every year.

    All help is appreciated!

    Regards,
    Madeleine


  • #2
    Let's say your dataset is madeleine.dta

    Code:
    clear 
    * years 1990 to 2014; change as needed 
    set obs 25 
    gen year = _n + 1989 
    
    merge 1:m year using madeleine 
    drop _merge
    replace count = 0 if count == . 
    collapse (sum) count, by(year)

    Comment


    • #3
      Hey Nick,

      Thanks for your reply! From your code, this is what stata generates:

      year count
      1 1
      2 2
      3 1
      4 1
      5 4
      6 3
      7 1
      8 1
      9 2
      10 2
      11 1
      12 2
      13 1
      14 1
      15 2
      1990 0
      1991 0
      1992 0
      1993 0
      1994 0
      1995 0
      1996 0
      1997 0
      1998 0
      1999 0
      2000 0
      2001 0
      2002 0
      2003 0
      2004 0
      2005 0
      2006 0
      2007 0
      2008 0
      2009 0
      2010 0
      2011 0
      2012 0
      2013 0
      2014 0

      Any suggestions as to what I am doing wrong?

      Regards,
      Madeleine

      Comment


      • #4
        The observations with year 1 to 15 don't come from my code. So, they come from your original dataset. You need to

        Code:
        edit if year < 100
        (say) and work out what's wrong. Or, what you told us in #1 is wrong somehow. Use dataex to show us what the data really are like. Perhaps what you see are value labels, not values.
        Last edited by Nick Cox; 20 Nov 2019, 10:57.

        Comment


        • #5
          Thanks Nick!

          Realized from the dataex command that for variable year it was not the actual values of the variable but the value labels attached to those values. My apologies, and thanks again. Got the result i needed.

          Regards,
          Madeleine

          Comment

          Working...
          X