time variable in longitudinal dataset

Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#1

time variable in longitudinal dataset

06 Jul 2022, 05:56

Hello,

I have a five-quarter longitudinal dataset from (labour force survey UK LFS) April 2019 to June 2020. LFS has a rotating panel, so we can follow people for five conceive quarters (five waive )

The first thing I need to define is the time index in this panel data, as seen in the example of data is below:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(WEEK W1YR) double(PERSID HOURPAY1) byte(HOURPAY2 HOURPAY3 HOURPAY4) double HOURPAY5 byte(HHLD FLOW FLEXW75 FLEXW74 FLEXW73 QRTR) 7 9 10792030101 -9 -9 -9 -9 -9 1 12 -9 -9 -9 2 12 9 11292020101 7.54 -9 -9 -9 9.5 1 3 2 2 2 2 5 9 20592040101 48.07 -9 -9 -9 43.68 1 3 2 2 2 2 5 9 20592040102 -9 -9 -9 -9 -9 1 3 2 2 2 2 5 9 20592040103 -9 -9 -9 -9 -9 1 2 -9 2 -9 2 end label values WEEK WEEK label values W1YR W1YR label values HOURPAY1 HOURPAY1 label def HOURPAY1 -9 "Does not apply", modify label values HOURPAY2 HOURPAY2 label def HOURPAY2 -9 "Does not apply", modify label values HOURPAY3 HOURPAY3 label def HOURPAY3 -9 "Does not apply", modify label values HOURPAY4 HOURPAY4 label def HOURPAY4 -9 "Does not apply", modify label values HOURPAY5 HOURPAY5 label def HOURPAY5 -9 "Does not apply", modify label values HHLD HHLD label values FLOW FLOW label def FLOW 2 "Entrant to working-age between first and final quarter", modify label def FLOW 3 "In employment at first quarter; in employment at final quarter (EE)", modify label def FLOW 12 "Reached retirement age by final quarter", modify label values FLEXW75 FLEXW75 label def FLEXW75 -9 "Does not apply", modify label def FLEXW75 2 "No", modify label values FLEXW74 FLEXW74 label def FLEXW74 -9 "Does not apply", modify label def FLEXW74 2 "No", modify label values FLEXW73 FLEXW73 label def FLEXW73 -9 "Does not apply", modify label def FLEXW73 2 "No", modify label values QRTR QRTR label def QRTR 2 "AJ (April to June)", modify

There is a variable "PERSID", which means persistent identifier, a variable "QRTR" which means the Year that addresses the first entered the survey, a variable "FLOW", which means Categories relating to labour force gross flows, variable "FLEXW73", which means a worker who has a zero-hours contract, and variable "HOURPAY" which means how much get per hours worked in a week. Because I have longitudinal data the majority of variables are repeated 5 times, in other words, the variable "HOURPAY1" this for quarter 1, "HOURPAY2" is for quarter 2, "HOURPAY3" is for quarter 3, "HOURPAY4" is for the quarter, and "HOURPAY5" is for quarter 5, however, the variable FLEXW7 IS just start from the third quarter ( the question about this variable just asked tow time per year) and the data set has this variable just for there quarters (FLEXW73, FLEXW74, and FLEXW75), not all quarter.

My questions are :
1- How can I use this variable "QRTR" as a time variable? For example, if I need to define the flow of some worker's group ( FLEXW7) from the third quarter until the last quarter ( increase or decrease of workers in this type of contract ). what do I have to do in this case?

2- Is three any way to append variables "FLEXW73, FLEXW74, and FLEXWE75" to be in one variable? (not that I need to find the flow of this variable over all the quarters )

3- is this dataset in long format or wide format ?

4- how can I use the command to analyse the "FLOW" on FLEXW73 ...5 over each quarter and also in all quarters?

Many thanks,
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#2

06 Jul 2022, 10:35

This data set is currently in wide layout, and the solution to your questions 2 and 4 is to convert it to long:

Code:

isid PERSID, sort reshape long HOURPAY FLEXW7, i(PERSID) j(quarter)

I do not understand question 1 because I cannot comprehend your description of the variable QRTR. I am left with no understanding of what it does or how it relates to any other variables or to the circumstances of administration of the survey.

Since the survey was administered over 5 quarters, the variable quarter, created by the -reshape- command, can serve as a time variable to declare this as panel data with PERSID as the panel variable.
Comment
Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#3

06 Jul 2022, 12:55

Originally posted by Clyde Schechter View Post

This data set is currently in wide layout, and the solution to your questions 2 and 4 is to convert it to long:

Code:

isid PERSID, sort reshape long HOURPAY FLEXW7, i(PERSID) j(quarter)

I do not understand question 1 because I cannot comprehend your description of the variable QRTR. I am left with no understanding of what it does or how it relates to any other variables or to the circumstances of administration of the survey.

Since the survey was administered over 5 quarters, the variable quarter, created by the -reshape- command, can serve as a time variable to declare this as panel data with PERSID as the panel variable.

Thanks for your assist

The variable QRTR look like this:

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte QRTR 2 2 2 2 2 end label values QRTR QRTR label def QRTR 2 "AJ (April to June)", modify

so I thought might be used as a time variable and no need to generate a new time variable.

I converted the data to long format, but I got the missing value on the "FLEXW7" variable because this variable has a value in 3 quarters only, while, other variables have a value for 5 quarters. When I reshape this to long format, by default, those that do not have a value for quarters 1 and 2, are all missing in the "FLEXW7" variable but are still a row in the dataset.

I try to do that

Code:

drop FLEXW7 if FLEXW7==.

but it seems to remove everyone, and this bais my data and the number of observation

Is there any way to deal with missing value in this case?

Thanks
Ali

Last edited by Ali Abutaleb; 06 Jul 2022, 13:01.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#4

06 Jul 2022, 13:22

Well, the variable QRTR is just a single value for each PERSID, so it cannot possibly distinguish different time periods. I have no idea what it is supposed to represent. You should review the survey documentation provided by whoever curates this survey to get more information about it. Whatever it is, it is not a time variable for panel data.

I converted the data to long format, but I got the missing value on the "FLEXW7" variable because this variable has a value in 3 quarters only, while, other variables have a value for 5 quarters. When I reshape this to long format, by default, those that do not have a value for quarters 1 and 2, are all missing in the "FLEXW7" variable but are still a row in the dataset.

This is exactly how it is supposed to work. Because FLEXW7 was only ascertained in quarters 3 through 5, there is no information about what it might have been in quarters 1 and 2. To have full panel data for the survey, you need to have an observation for each PERSID in each of the 5 quarters. Since there is no information about FLEXW7 in quarters 1 and 2, that shows up as missing values for that variable in those observations ("rows"). It's exactly right. Don't tamper with it!

That said, I will comment on your -drop FLEXW7 if FLEXW7==. - command. In Stata, -drop- can be used in two different ways. What you wrote attempts to mix the two up, and is not possible. One way to use drop is to drop observations ("rows") that meet a certain condition. So, if you said -drop if FLEXW7 == .-, you would remove all observations that have a missing value for the variable FLEXW7. Notice that in this -drop- command there is no variable named after the word -drop-. The other use of -drop- is to remove entire variables ("columns") from the data. If you wrote -drop FLEXW7-, the variable FLEXW7 would disappear altogether. Notice that there is no -if- condition in this version of -drop-. When you try to put them together, as you did, the result is illegal syntax, and whatever operation you think might correspond to that syntax, were it legal, does not exist in Stata. You cannot -drop- a variable only in certain observations, nor -drop- an observation only in certain variables. In situations where an observation lacks information for a particular variable, that is represented with a missing value.

Now, you might perhaps want to distinguish the missing values of FLEXW7 that arise in quarters 1 and 2 when the question was not asked, from other missing values of FLEXW7 that might arise because the respondent gave no answer to the question. If so, you can use one of Stata's "extended missing values." For that. (See -help missing- if you are not familiar with extended missing values.) So you could do something like:

Code:

replace FLEXW7 = .a if inlist(quarter, 1, 2) label define FLEXW75 .a "Not Asked", add

But only do this if you will have a need during your analyses to distinguish missingness due to not being asked from missingness due to not being answered.
Comment
Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#5

07 Jul 2022, 11:34

Originally posted by Clyde Schechter View Post

Well, the variable QRTR is just a single value for each PERSID, so it cannot possibly distinguish different time periods. I have no idea what it is supposed to represent. You should review the survey documentation provided by whoever curates this survey to get more information about it. Whatever it is, it is not a time variable for panel data.

This is exactly how it is supposed to work. Because FLEXW7 was only ascertained in quarters 3 through 5, there is no information about what it might have been in quarters 1 and 2. To have full panel data for the survey, you need to have an observation for each PERSID in each of the 5 quarters. Since there is no information about FLEXW7 in quarters 1 and 2, that shows up as missing values for that variable in those observations ("rows"). It's exactly right. Don't tamper with it!

That said, I will comment on your -drop FLEXW7 if FLEXW7==. - command. In Stata, -drop- can be used in two different ways. What you wrote attempts to mix the two up, and is not possible. One way to use drop is to drop observations ("rows") that meet a certain condition. So, if you said -drop if FLEXW7 == .-, you would remove all observations that have a missing value for the variable FLEXW7. Notice that in this -drop- command there is no variable named after the word -drop-. The other use of -drop- is to remove entire variables ("columns") from the data. If you wrote -drop FLEXW7-, the variable FLEXW7 would disappear altogether. Notice that there is no -if- condition in this version of -drop-. When you try to put them together, as you did, the result is illegal syntax, and whatever operation you think might correspond to that syntax, were it legal, does not exist in Stata. You cannot -drop- a variable only in certain observations, nor -drop- an observation only in certain variables. In situations where an observation lacks information for a particular variable, that is represented with a missing value.

Now, you might perhaps want to distinguish the missing values of FLEXW7 that arise in quarters 1 and 2 when the question was not asked, from other missing values of FLEXW7 that might arise because the respondent gave no answer to the question. If so, you can use one of Stata's "extended missing values." For that. (See -help missing- if you are not familiar with extended missing values.) So you could do something like:

Code:

replace FLEXW7 = .a if inlist(quarter, 1, 2) label define FLEXW75 .a "Not Asked", add

But only do this if you will have a need during your analyses to distinguish missingness due to not being asked from missingness due to not being answered.

I really appreciate your reply and you suggestion to deal with this matter the last thing is I am trying to figure out the problem related to the time variable I generated when I reshape the data, so the time variable now takes values 1 through 5. what I'm thinking now is to get this format " April - June 2019" instead of "1", and "July - Seb 2019" instead of "2" ..... I try to find out the best way to format the variable, but I could not Thank for you your assist Ali

Last edited by Ali Abutaleb; 07 Jul 2022, 11:36.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29801
#6

07 Jul 2022, 13:45

Code:

label define quarter 1 "Apr-Jun 2019" /// 2 "Jul-Sept 2019" /// 3 "Oct-Dec 2019" /// 4 "Jan-Mar 2020" /// 5 "Apr-Jun 2020" label values quarter quarter

This will change what shows in displays, listings, regression outputs, etc. It, fortunately, does not change the internal storage of the variable, which remains 1 to 5 so it can actually be used for computations.
Comment
zealous zara

Join Date: Nov 2023

Posts: 2
#7

09 Nov 2023, 13:01

Hi, can you guide me how to get longitudinal data from five quarter LFSUK data. Thanks.
Comment
Meabh Cairns

Join Date: Jan 2024

Posts: 4
#8

28 Jan 2024, 09:14

Hello,

Following on from this thread.

I am using the same data. I wish to combine multiple datasets from different years (although each dataset is fiver quarter). I understand that QRTR can be used as a time variable however must I create a new time variable indicating the year? In either case, how could I proceed to combine these datasets? Append or merge?

Any guidance would be greatly appreciated.
Comment

Announcement

time variable in longitudinal dataset

Comment

Comment

Comment

Comment

Comment

Comment

Comment