keep variables longitudinal dataset

Ana Vasconcelos

Join Date: Aug 2016
Posts: 193

keep variables longitudinal dataset

09 Mar 2024, 13:13

Hello,

I have a longitudinal dataset (example below) and I would like to keep the variables where individuals answer both to "jbsat" and "abused" in both waves. Can you please help me with the code?
Thank you in advance.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte jbsat float abused long pidp int dvage
6 0  272238687 53
. 0 1429166207 65
. 0  410562247 53
7 0  408957451 60
5 1 1021413047 51
5 0 1157811539 23
7 0 1292038767 65
. 0  367828329 37
5 0  410567687 49
6 0  342026415 30
4 1  274411287 37
4 0  136255007 61
. 0  204671847 69
5 1  476031301 24
6 0  884396451 27
. 0  410593527 54
6 0 1156820087 45
6 0  817969971 54
6 1 1564067331 62
. 0  410600331 18
4 0  342305887 55
6 0  476295127 55
6 0 1632140767 55
6 0  700830449 43
1 1  204994171 56
6 0  273015931 41
. 0 1292905767 76
. 0 1156448807 71
7 0  544184971 51
. 0  545266167 71
7 0  680529045 62
. 0 1360684087 61
6 0  136126487 45
5 0  496921565 49
. 0  410860085 75
6 0  410881845 45
. 0  816336611 43
6 0  683200085 70
7 1  273200207 71
. 0  341330767 80
6 0 1429297451 47
7 0  136399855 18
6 0  410953925 52
4 0  204004095 32
7 0  817726527 51
5 0 1020901691 47
. 0  557895821 16
2 0  411071573 59
6 0  411071573 65
5 0  558048133 36
6 0 1428275407 36
5 0  953045847 37
5 0 1225617047 61
6 0  411169485 68
. 0 1292006807 70
. 0  682171935 16
7 0  749207011 48
6 0  477606169 77
2 0  748533131 54
5 0  411286445 51
2 0 1224479407 43
6 1  884665727 52
. 0  205283859 34
6 0  681335533 25
5 0  361357445 42
. 0 1020211487 68
6 0 1157296103 27
6 0 1429068971 58
. 0  817175727 66
6 0  411338125 37
7 0 1088680691 43
5 0  411346973 50
. 0  136133287 71
6 0  409027491 66
. 0  408696339 20
. 0  204038767 72
. 0  411390489 40
7 0 1633184567 65
6 0  231680093 28
5 0  411495205 55
. 0  411558445 70
5 0  749221979 20
6 0  411568649 42
5 0  411568649 49
5 0  411578165 48
. 0  476283571 68
. 0  411578173 18
7 0  477110451 67
6 0  411612845 62
6 0  409332811 72
5 0 1429634047 49
. 0  681193407 66
6 0 1020187691 46
7 0  952630367 48
. 0  411752245 78
6 0  408611335 27
6 0  411808013 41
6 0  750630927 32
. 0  136133287 65
5 0  205904687 56
end
label values jbsat e_jbsat
label def e_jbsat 1 "completely dissatisfied", modify
label def e_jbsat 2 "mostly dissatisfied", modify
label def e_jbsat 4 "neither satisfied or dissatisfied", modify
label def e_jbsat 5 "somewhat satisfied", modify
label def e_jbsat 6 "mostly satisfied", modify
label def e_jbsat 7 "completely satisfied", modify
label values abused abc
label def abc 0 "not abused at work", modify
label def abc 1 "abused at work", modify
label values dvage e_dvage

Tags: longitudinal data, panel data

Clyde Schechter

Join Date: Apr 2014

Posts: 29788
#2

09 Mar 2024, 13:32

Your question is unclear and the example data is incomplete. You refer to keeping those who answer yes to jbsat and abused in both waves. But most of the different respondents (I assume pidp is a respondent identifier) have only a single observation. So we don't know if they would have answered yes in another wave. Should they be included or not if they answered yes to both items in the one and only wave where they appear in the data?

Moreover, it is unclear what you mean by answering "yes" to jbsat. The variable jbsat is not a dichotomy; it scores range from 1 to 7, and those 7 options are degrees of satisfaction, not agreement with something. So what you would consider a "yes" answer to that question?
Comment
Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#3

09 Mar 2024, 13:49

In both waves there are 16666 individuals that answer to the question about job satisfaction (Variables: "jbsat", from 1 to 7). There are 24861 individuals that answer the question about abuse (variable "abused" either 0 or 1). i would like to keep only the individuals ("pidp") that answer both questions in the 2 waves. Should I send more information about the dataset?
Thank you very much in advance.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29788
#4

09 Mar 2024, 14:31

Yes, but you haven't answered my questions, and you seem to have changed what you are asking for.

So I'm still unsure what you are asking for. Do you want to discard all people who only responded in one of the waves? And, if that is correct, do you still want to keep only those who answered both of the questions "yes" both time, or is it enough that they simply responded in both waves? And if you do want only those who answered "yes" both times, which answer(s) to the jbsat question, with responses 1 through 7, count(s) as a "yes?"
Comment
Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#5

09 Mar 2024, 15:11

Yes, I want to discard all people who only responded in one of the waves.
I want to keep those who simply responded in both waves.
I am sorry for the confusion.
Thank you very much
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 29788

09 Mar 2024, 15:19

Code:

by pidp (dvage), sort: egen jbsat_count = count(jbsat)
by pidp (dvage): egen egen abused_count = count(abused)
keep if jbsat_count >= 2 & abused_count >= 2

Comment

Ana Vasconcelos

Join Date: Aug 2016

Posts: 193
#7

16 Mar 2024, 05:15

Hello,

I triied the code you sent and appears an error saying "too many variables specified". Can you please help me with this?
Thank you in advance
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29788
#8

16 Mar 2024, 10:33

I see my mistake in the second command, where -egen- appears twice in a row. The code should be:

Code:

by pidp (dvage), sort: egen jbsat_count = count(jbsat) by pidp (dvage): egen abused_count = count(abused) // N.B. -egen- ONLY ONCE keep if jbsat_count >= 2 & abused_count >= 2

My apologies for the error.
Comment

Announcement

keep variables longitudinal dataset

Comment

Comment

Comment

Comment

Comment

Comment

Comment