Hello everyone,
I am working with Stata to finish my thesis. I want to use a Repeat Sales Regression which is a methodology commonly used in the real estate literature. Before I can start my analysis I am preparing my dataset. I am curious about a phenomen that happened in September 2008. Therefore, I want to look at properties which are sold for the first time before September 2008, and then look at properties which are sold after September 2008.
The point I am struggling with is that some properties are sold more than once before September 2008, which makes only the last sale before September 2008 the most useful one. The first two observations can be dropped.
I can give you an example of what my dataset looks like. Addressid is the property id that is sold, which can be multiple times over the period. Salesdate is the actual date the property was sold. And Salesnr is the number of times the property is sold.
For addressid 001, no further action is warranted because one sale is before September 2008, and one after. However, both addressid 002 and 003 have two sales before September 2008. Now I want to drop the first observation because this one is not useful for my analysis, but I can't drop salesnr == 1, because then I would also drop the right value for addressid == 001. Furthermore, addressid 004 has multiple sales after September 2008 but these are still relevant for my analysis.
I tried using the following function, but I cannot add weights:
by addressid: drop addressid[_n==1] if Salesdate[_n==2] < 20080900
I really hope you guys can help me. I've tried a couple of things but they don't seem to work.
Kind regards,
Stijn Martens
I am working with Stata to finish my thesis. I want to use a Repeat Sales Regression which is a methodology commonly used in the real estate literature. Before I can start my analysis I am preparing my dataset. I am curious about a phenomen that happened in September 2008. Therefore, I want to look at properties which are sold for the first time before September 2008, and then look at properties which are sold after September 2008.
The point I am struggling with is that some properties are sold more than once before September 2008, which makes only the last sale before September 2008 the most useful one. The first two observations can be dropped.
I can give you an example of what my dataset looks like. Addressid is the property id that is sold, which can be multiple times over the period. Salesdate is the actual date the property was sold. And Salesnr is the number of times the property is sold.
Addressid | Salesdate (yyyy-mm-dd) | Salesnr |
001 | 20020308 | 1 |
001 | 20120509 | 2 |
002 | 20010201 | 1 |
002 | 20050518 | 2 |
002 | 20110315 | 3 |
003 | 20030711 | 1 |
003 | 20060801 | 2 |
003 | 20090108 | 3 |
004 | 20050423 | 1 |
004 | 20091106 | 2 |
004 | 20120829 | 3 |
I tried using the following function, but I cannot add weights:
by addressid: drop addressid[_n==1] if Salesdate[_n==2] < 20080900
I really hope you guys can help me. I've tried a couple of things but they don't seem to work.
Kind regards,
Stijn Martens
Comment