Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop first observation if second observation is before a certain date

    Hello everyone,

    I am working with Stata to finish my thesis. I want to use a Repeat Sales Regression which is a methodology commonly used in the real estate literature. Before I can start my analysis I am preparing my dataset. I am curious about a phenomen that happened in September 2008. Therefore, I want to look at properties which are sold for the first time before September 2008, and then look at properties which are sold after September 2008.

    The point I am struggling with is that some properties are sold more than once before September 2008, which makes only the last sale before September 2008 the most useful one. The first two observations can be dropped.

    I can give you an example of what my dataset looks like. Addressid is the property id that is sold, which can be multiple times over the period. Salesdate is the actual date the property was sold. And Salesnr is the number of times the property is sold.
    Addressid Salesdate (yyyy-mm-dd) Salesnr
    001 20020308 1
    001 20120509 2
    002 20010201 1
    002 20050518 2
    002 20110315 3
    003 20030711 1
    003 20060801 2
    003 20090108 3
    004 20050423 1
    004 20091106 2
    004 20120829 3
    For addressid 001, no further action is warranted because one sale is before September 2008, and one after. However, both addressid 002 and 003 have two sales before September 2008. Now I want to drop the first observation because this one is not useful for my analysis, but I can't drop salesnr == 1, because then I would also drop the right value for addressid == 001. Furthermore, addressid 004 has multiple sales after September 2008 but these are still relevant for my analysis.

    I tried using the following function, but I cannot add weights:

    by addressid: drop addressid[_n==1] if Salesdate[_n==2] < 20080900

    I really hope you guys can help me. I've tried a couple of things but they don't seem to work.

    Kind regards,
    Stijn Martens

  • #2
    That is some way away from legal drop syntax as you can drop variables or observations but not both at the same time. The fact that you're mixing syntaxes fantastically (what Stata guesses are weights are mixes of true-or-false expressions and subscripts, based perhaps on other software you know) is just what bites first.

    I see what you're doing with your dates as long integers but at some point that won't help you, such as if you want the difference between two daily dates when 20080831 and 20080901 don't differ by 1 but by 70! So, I advise using Stata's own system for daily dates.

    With some engineering to take your helpful data example (helpful, not as helpful as using dataex (SSC) such as we request) this shows some technique:


    Code:
    clear 
    input str3 Addressid str8 sSalesdate Salesnr
    001    20020308    1
    001    20120509    2
    002    20010201    1
    002    20050518    2
    002    20110315    3
    003    20030711    1
    003    20060801    2
    003    20090108    3
    004    20050423    1
    004    20091106    2
    004    20120829    3
    end 
    
    gen Salesdate = daily(sSalesdate, "YMD") 
    format Salesdate %td 
    
    gen tokeep = Salesdate >= mdy(9, 1, 2008) 
    bysort Addressid (Salesdate) : replace tokeep = 1 if tokeep[_n+1] 
    list, sepby(Addressid) 
    
         +----------------------------------------------------+
         | Addres~d   sSales~e   Salesnr   Salesdate   tokeep |
         |----------------------------------------------------|
      1. |      001   20020308         1   08mar2002        1 |
      2. |      001   20120509         2   09may2012        1 |
         |----------------------------------------------------|
      3. |      002   20010201         1   01feb2001        0 |
      4. |      002   20050518         2   18may2005        1 |
      5. |      002   20110315         3   15mar2011        1 |
         |----------------------------------------------------|
      6. |      003   20030711         1   11jul2003        0 |
      7. |      003   20060801         2   01aug2006        1 |
      8. |      003   20090108         3   08jan2009        1 |
         |----------------------------------------------------|
      9. |      004   20050423         1   23apr2005        1 |
     10. |      004   20091106         2   06nov2009        1 |
     11. |      004   20120829         3   29aug2012        1 |
         +----------------------------------------------------+
    
    keep if tokeep 
    drop tokeep 
    list, sepby(Addressid) 
    
         +-------------------------------------------+
         | Addres~d   sSales~e   Salesnr   Salesdate |
         |-------------------------------------------|
      1. |      001   20020308         1   08mar2002 |
      2. |      001   20120509         2   09may2012 |
         |-------------------------------------------|
      3. |      002   20050518         2   18may2005 |
      4. |      002   20110315         3   15mar2011 |
         |-------------------------------------------|
      5. |      003   20060801         2   01aug2006 |
      6. |      003   20090108         3   08jan2009 |
         |-------------------------------------------|
      7. |      004   20050423         1   23apr2005 |
      8. |      004   20091106         2   06nov2009 |
      9. |      004   20120829         3   29aug2012 |
         +-------------------------------------------+

    Comment


    • #3
      Awesome, thank you so much Mr. Cox!

      Comment

      Working...
      X