Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping several values

    Hi all,

    I have a question about dropping values within a variable.
    In my dataset, the variable gvkey defines a company, salecs defines what number is sold to a customer. Yeardate is in what year that number is sold.
    I need to have a dataset where a company can have multiple sales in a certain year, but I need only companies that have sales at least once every year ranging from 2000-2011. However, I don't know how to drop companies that have sales for several years but not from all years 2000-2011.

    This is part of my dataset where you can see a company which has sales all years from 2000-2011 and a company that does not have sales every year for that time period.

    gvkey salecs yeardate
    1004 132.048 2000
    1004 114 2000
    1004 139.072 2001
    1004 57.4 2001
    1004 163.173 2002
    1004 40.925 2003
    1004 85.10299999999999 2003
    1004 44.163 2003
    1004 125.059 2004
    1004 60.468 2004
    1004 37.031 2004
    1004 157.165 2005
    1004 25.976 2005
    1004 69.027 2005
    1004 77.34 2006
    1004 31.089 2006
    1004 191.809 2006
    1004 217.911 2007
    1004 32.184 2007
    1004 75.185 2007
    1004 99.752 2008
    1004 80.295 2008
    1004 10.84 2008
    1004 33.66 2008
    1004 294.249 2008
    1004 330.132 2008
    1004 36.453 2009
    1004 381.811 2009
    1004 72.533 2009
    1004 114.786 2009
    1004 302.016 2009
    1004 400.222 2010
    1004 44.289 2010
    1004 10.372 2010
    1004 289.435 2010
    1004 162.575 2010
    1004 34.179 2010
    1004 320.883 2011
    1004 308.585 2011
    1004 31.538 2011
    1004 16.516 2011
    1004 49.397 2011
    1004 524.129 2011
    1013 164.395 2000
    1013 197.274 2000
    1013 197.274 2000
    1013 263.032 2000
    1013 657.58 2000
    1013 657.58 2000
    1013 1150.765 2000
    1013 111.056 2002
    1013 198.712 2003
    1013 145.362 2003
    1013 170.977 2004
    1013 124.704 2004
    1013 104.312 2004
    1013 134.115 2004
    1013 154 2005
    1013 199 2005
    1013 146 2005
    1013 275 2005
    1013 168 2006
    1013 205 2006
    1013 191 2006
    1013 163 2007
    1013 204 2007
    1013 236 2007
    1013 146 2008
    1013 233 2008
    1013 240 2008
    1013 176 2009
    1013 71 2009
    1013 203 2009
    1013 84 2010
    1013 300 2010
    1013 146 2010

    as you can see, company 1013 does not have sales in 2001 and 2011. What code should I use so that I am only left with companies that have at least one sale every year ranging from 2000-2011?

    Thanks in advance

  • #2
    I am sorry for the bad way of displaying my dataset, I can't find out how to post it nicely on here

    Comment


    • #3
      Perhaps this code will start you on your way.
      Code:
      clear all 
      cls
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input int gvkey float salecs int yeardate
      1004  132.048 2000
      1004      114 2000
      1004  139.072 2001
      1004     57.4 2001
      1004  163.173 2002
      1004   40.925 2003
      1004   85.103 2003
      1004   44.163 2003
      1004  125.059 2004
      1004   60.468 2004
      1004   37.031 2004
      1004  157.165 2005
      1004   25.976 2005
      1004   69.027 2005
      1004    77.34 2006
      1004   31.089 2006
      1004  191.809 2006
      1004  217.911 2007
      1004   32.184 2007
      1004   75.185 2007
      1004   99.752 2008
      1004   80.295 2008
      1004    10.84 2008
      1004    33.66 2008
      1004  294.249 2008
      1004  330.132 2008
      1004   36.453 2009
      1004  381.811 2009
      1004   72.533 2009
      1004  114.786 2009
      1004  302.016 2009
      1004  400.222 2010
      1004   44.289 2010
      1004   10.372 2010
      1004  289.435 2010
      1004  162.575 2010
      1004   34.179 2010
      1004  320.883 2011
      1004  308.585 2011
      1004   31.538 2011
      1004   16.516 2011
      1004   49.397 2011
      1004  524.129 2011
      1013  164.395 2000
      1013  197.274 2000
      1013  197.274 2000
      1013  263.032 2000
      1013   657.58 2000
      1013   657.58 2000
      1013 1150.765 2000
      1013  111.056 2002
      1013  198.712 2003
      1013  145.362 2003
      1013  170.977 2004
      1013  124.704 2004
      1013  104.312 2004
      1013  134.115 2004
      1013      154 2005
      1013      199 2005
      1013      146 2005
      1013      275 2005
      1013      168 2006
      1013      205 2006
      1013      191 2006
      1013      163 2007
      1013      204 2007
      1013      236 2007
      1013      146 2008
      1013      233 2008
      1013      240 2008
      1013      176 2009
      1013       71 2009
      1013      203 2009
      1013       84 2010
      1013      300 2010
      1013      146 2010
      end
      sort gvkey yeardate
      // find gaps of 2 or more years
      by gvkey: generate gap = gvkey==gvkey[_n-1] & yeardate-yeardate[_n-1] > 1 
      // if any gap drop all observations
      by gvkey: egen todrop = max(gap)
      // drop if not from 2000 through 2011
      by gvkey: replace todrop = 1 if yeardate[1]!=2000 | yeardate[_N]!=2011
      tab gvkey todrop
      Code:
      . tab gvkey todrop
      
                 |        todrop
           gvkey |         0          1 |     Total
      -----------+----------------------+----------
            1004 |        43          0 |        43 
            1013 |         0         33 |        33 
      -----------+----------------------+----------
           Total |        43         33 |        76
      And to answer your question from post #2, please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.

      Comment


      • #4
        Thank you very much for your quick response and help, this worked perfectly!

        Comment

        Working...
        X