define a dummy?

River Huang

Join Date: Mar 2016
Posts: 1899

21 Dec 2021, 02:49

Dear All, I found this question here (in Chinese). Given the data

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id str33 change str4 year byte repeat_times
600000 "enter"                           "2015" 0
600007 "enter"                           "2015" 0
600004 "enter"                           "2015" 1
600004 "remove"                          "2021" 1
600006 "enter"                           "2016" 1
600006 "remove"                          "2018" 1
600008 "enter"                           "2015" 2
600008 "remove"                          "2019" 2
600008 "enter（remove, and re-enter）" "2019" 2
600160 "enter"                           "2015" 2
600160 "remove"                          "2017" 2
600160 "enter（remove, and re-enter）" "2018" 2
600675 "enter"                           "2015" 3
600675 "remove"                          "2015" 3
600675 "enter（remove, and re-enter）" "2019" 3
600675 "remove"                          "2021" 3
600686 "enter"                           "2015" 3
600686 "remove"                          "2015" 3
600686 "enter（remove, and re-enter）" "2016" 3
600686 "remove"                          "2017" 3
end

I wish to define a dummy which is equal to 1 if the `id' enters a program in `year' and afterwards, 0 otherwise. Of course, when the `id' is removed from the program, the dummy is set to 0 again (until if it enters again) and afterwards. Note that some `id's may enter and removed many times.
Is it possible to delete the observations (`id') which duration between the entered and removed year is less than 1 (year)?

Thanks.

Ho-Chuan (River) Huang
Stata 17.0, MP(4)

Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 9957

21 Dec 2021, 13:32

The description is not very clear. It appears that you want to tag every second observation. Also, I assume that the variable "change" does not exist in the original dataset. With these assumptions:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long id str4 year
600000 "2015"
600004 "2015"
600004 "2021"
600006 "2016"
600006 "2018"
600007 "2015"
600008 "2015"
600008 "2019"
600008 "2019"
600160 "2015"
600160 "2017"
600160 "2018"
600675 "2015"
600675 "2015"
600675 "2019"
600675 "2021"
600686 "2015"
600686 "2015"
600686 "2016"
600686 "2017"
end

destring year, replace
*#1
bys id (year): gen wanted=mod(_n, 2)
*#2
bys id (year wanted): drop if !wanted & year==year[_n+1]

Res.:

Code:

. gsort id year -wanted

. l, sepby(id)

     +------------------------+
     |     id   year   wanted |
     |------------------------|
  1. | 600000   2015        1 |
     |------------------------|
  2. | 600004   2015        1 |
  3. | 600004   2021        0 |
     |------------------------|
  4. | 600006   2016        1 |
  5. | 600006   2018        0 |
     |------------------------|
  6. | 600007   2015        1 |
     |------------------------|
  7. | 600008   2015        1 |
  8. | 600008   2019        1 |
     |------------------------|
  9. | 600160   2015        1 |
 10. | 600160   2017        0 |
 11. | 600160   2018        1 |
     |------------------------|
 12. | 600675   2015        1 |
 13. | 600675   2019        1 |
 14. | 600675   2021        0 |
     |------------------------|
 15. | 600686   2015        1 |
 16. | 600686   2016        1 |
 17. | 600686   2017        0 |
     +------------------------+

.

Comment

River Huang

Join Date: Mar 2016

Posts: 1899
#3

21 Dec 2021, 18:05

Dear Andrew, Sorry for the unclear description. One reason is, of course, my poor English, and the other is probably due to data/question complexity. I will try to understand more and may come back yo you later. Thanks.

Ho-Chuan (River) Huang
Stata 17.0, MP(4)
Comment

Announcement

define a dummy?

Comment

Comment