Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • time variable in longitudinal dataset

    Hello,

    I have a five-quarter longitudinal dataset from (labour force survey UK LFS) April 2019 to June 2020. LFS has a rotating panel, so we can follow people for five conceive quarters (five waive )

    The first thing I need to define is the time index in this panel data, as seen in the example of data is below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(WEEK W1YR) double(PERSID HOURPAY1) byte(HOURPAY2 HOURPAY3 HOURPAY4) double HOURPAY5 byte(HHLD FLOW FLEXW75 FLEXW74 FLEXW73 QRTR)
     7 9 10792030101    -9 -9 -9 -9    -9 1 12 -9 -9 -9 2
    12 9 11292020101  7.54 -9 -9 -9   9.5 1  3  2  2  2 2
     5 9 20592040101 48.07 -9 -9 -9 43.68 1  3  2  2  2 2
     5 9 20592040102    -9 -9 -9 -9    -9 1  3  2  2  2 2
     5 9 20592040103    -9 -9 -9 -9    -9 1  2 -9  2 -9 2
    end
    label values WEEK WEEK
    label values W1YR W1YR
    label values HOURPAY1 HOURPAY1
    label def HOURPAY1 -9 "Does not apply", modify
    label values HOURPAY2 HOURPAY2
    label def HOURPAY2 -9 "Does not apply", modify
    label values HOURPAY3 HOURPAY3
    label def HOURPAY3 -9 "Does not apply", modify
    label values HOURPAY4 HOURPAY4
    label def HOURPAY4 -9 "Does not apply", modify
    label values HOURPAY5 HOURPAY5
    label def HOURPAY5 -9 "Does not apply", modify
    label values HHLD HHLD
    label values FLOW FLOW
    label def FLOW 2 "Entrant to working-age between first and final quarter", modify
    label def FLOW 3 "In employment at first quarter; in employment at final quarter (EE)", modify
    label def FLOW 12 "Reached retirement age by final quarter", modify
    label values FLEXW75 FLEXW75
    label def FLEXW75 -9 "Does not apply", modify
    label def FLEXW75 2 "No", modify
    label values FLEXW74 FLEXW74
    label def FLEXW74 -9 "Does not apply", modify
    label def FLEXW74 2 "No", modify
    label values FLEXW73 FLEXW73
    label def FLEXW73 -9 "Does not apply", modify
    label def FLEXW73 2 "No", modify
    label values QRTR QRTR
    label def QRTR 2 "AJ (April to June)", modify


    There is a variable "PERSID", which means persistent identifier, a variable "QRTR" which means the Year that addresses the first entered the survey, a variable "FLOW", which means Categories relating to labour force gross flows, variable "FLEXW73", which means a worker who has a zero-hours contract, and variable "HOURPAY" which means how much get per hours worked in a week. Because I have longitudinal data the majority of variables are repeated 5 times, in other words, the variable "HOURPAY1" this for quarter 1, "HOURPAY2" is for quarter 2, "HOURPAY3" is for quarter 3, "HOURPAY4" is for the quarter, and "HOURPAY5" is for quarter 5, however, the variable FLEXW7 IS just start from the third quarter ( the question about this variable just asked tow time per year) and the data set has this variable just for there quarters (FLEXW73, FLEXW74, and FLEXW75), not all quarter.


    My questions are :
    1- How can I use this variable "QRTR" as a time variable? For example, if I need to define the flow of some worker's group ( FLEXW7) from the third quarter until the last quarter ( increase or decrease of workers in this type of contract ). what do I have to do in this case?

    2- Is three any way to append variables "FLEXW73, FLEXW74, and FLEXWE75" to be in one variable? (not that I need to find the flow of this variable over all the quarters )

    3- is this dataset in long format or wide format ?

    4- how can I use the command to analyse the "FLOW" on FLEXW73 ...5 over each quarter and also in all quarters?


    Many thanks,



  • #2
    This data set is currently in wide layout, and the solution to your questions 2 and 4 is to convert it to long:

    Code:
    isid PERSID, sort
    reshape long HOURPAY FLEXW7, i(PERSID) j(quarter)
    I do not understand question 1 because I cannot comprehend your description of the variable QRTR. I am left with no understanding of what it does or how it relates to any other variables or to the circumstances of administration of the survey.

    Since the survey was administered over 5 quarters, the variable quarter, created by the -reshape- command, can serve as a time variable to declare this as panel data with PERSID as the panel variable.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      This data set is currently in wide layout, and the solution to your questions 2 and 4 is to convert it to long:

      Code:
      isid PERSID, sort
      reshape long HOURPAY FLEXW7, i(PERSID) j(quarter)
      I do not understand question 1 because I cannot comprehend your description of the variable QRTR. I am left with no understanding of what it does or how it relates to any other variables or to the circumstances of administration of the survey.

      Since the survey was administered over 5 quarters, the variable quarter, created by the -reshape- command, can serve as a time variable to declare this as panel data with PERSID as the panel variable.
      Thanks for your assist

      The variable QRTR look like this:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte QRTR
      2
      2
      2
      2
      2
      end
      label values QRTR QRTR
      label def QRTR 2 "AJ (April to June)", modify
      so I thought might be used as a time variable and no need to generate a new time variable.


      I converted the data to long format, but I got the missing value on the "FLEXW7" variable because this variable has a value in 3 quarters only, while, other variables have a value for 5 quarters. When I reshape this to long format, by default, those that do not have a value for quarters 1 and 2, are all missing in the "FLEXW7" variable but are still a row in the dataset.

      I try to do that
      Code:
       drop FLEXW7 if FLEXW7==. 
      but it seems to remove everyone, and this bais my data and the number of observation

      Is there any way to deal with missing value in this case?


      Thanks
      Ali
      Last edited by Ali Abutaleb; 06 Jul 2022, 13:01.

      Comment


      • #4
        Well, the variable QRTR is just a single value for each PERSID, so it cannot possibly distinguish different time periods. I have no idea what it is supposed to represent. You should review the survey documentation provided by whoever curates this survey to get more information about it. Whatever it is, it is not a time variable for panel data.

        I converted the data to long format, but I got the missing value on the "FLEXW7" variable because this variable has a value in 3 quarters only, while, other variables have a value for 5 quarters. When I reshape this to long format, by default, those that do not have a value for quarters 1 and 2, are all missing in the "FLEXW7" variable but are still a row in the dataset.
        This is exactly how it is supposed to work. Because FLEXW7 was only ascertained in quarters 3 through 5, there is no information about what it might have been in quarters 1 and 2. To have full panel data for the survey, you need to have an observation for each PERSID in each of the 5 quarters. Since there is no information about FLEXW7 in quarters 1 and 2, that shows up as missing values for that variable in those observations ("rows"). It's exactly right. Don't tamper with it!

        That said, I will comment on your -drop FLEXW7 if FLEXW7==. - command. In Stata, -drop- can be used in two different ways. What you wrote attempts to mix the two up, and is not possible. One way to use drop is to drop observations ("rows") that meet a certain condition. So, if you said -drop if FLEXW7 == .-, you would remove all observations that have a missing value for the variable FLEXW7. Notice that in this -drop- command there is no variable named after the word -drop-. The other use of -drop- is to remove entire variables ("columns") from the data. If you wrote -drop FLEXW7-, the variable FLEXW7 would disappear altogether. Notice that there is no -if- condition in this version of -drop-. When you try to put them together, as you did, the result is illegal syntax, and whatever operation you think might correspond to that syntax, were it legal, does not exist in Stata. You cannot -drop- a variable only in certain observations, nor -drop- an observation only in certain variables. In situations where an observation lacks information for a particular variable, that is represented with a missing value.

        Now, you might perhaps want to distinguish the missing values of FLEXW7 that arise in quarters 1 and 2 when the question was not asked, from other missing values of FLEXW7 that might arise because the respondent gave no answer to the question. If so, you can use one of Stata's "extended missing values." For that. (See -help missing- if you are not familiar with extended missing values.) So you could do something like:
        Code:
        replace FLEXW7 = .a if inlist(quarter, 1, 2)
        label define FLEXW75 .a "Not Asked", add
        But only do this if you will have a need during your analyses to distinguish missingness due to not being asked from missingness due to not being answered.

        Comment


        • #5
          Originally posted by Clyde Schechter View Post
          Well, the variable QRTR is just a single value for each PERSID, so it cannot possibly distinguish different time periods. I have no idea what it is supposed to represent. You should review the survey documentation provided by whoever curates this survey to get more information about it. Whatever it is, it is not a time variable for panel data.


          This is exactly how it is supposed to work. Because FLEXW7 was only ascertained in quarters 3 through 5, there is no information about what it might have been in quarters 1 and 2. To have full panel data for the survey, you need to have an observation for each PERSID in each of the 5 quarters. Since there is no information about FLEXW7 in quarters 1 and 2, that shows up as missing values for that variable in those observations ("rows"). It's exactly right. Don't tamper with it!

          That said, I will comment on your -drop FLEXW7 if FLEXW7==. - command. In Stata, -drop- can be used in two different ways. What you wrote attempts to mix the two up, and is not possible. One way to use drop is to drop observations ("rows") that meet a certain condition. So, if you said -drop if FLEXW7 == .-, you would remove all observations that have a missing value for the variable FLEXW7. Notice that in this -drop- command there is no variable named after the word -drop-. The other use of -drop- is to remove entire variables ("columns") from the data. If you wrote -drop FLEXW7-, the variable FLEXW7 would disappear altogether. Notice that there is no -if- condition in this version of -drop-. When you try to put them together, as you did, the result is illegal syntax, and whatever operation you think might correspond to that syntax, were it legal, does not exist in Stata. You cannot -drop- a variable only in certain observations, nor -drop- an observation only in certain variables. In situations where an observation lacks information for a particular variable, that is represented with a missing value.

          Now, you might perhaps want to distinguish the missing values of FLEXW7 that arise in quarters 1 and 2 when the question was not asked, from other missing values of FLEXW7 that might arise because the respondent gave no answer to the question. If so, you can use one of Stata's "extended missing values." For that. (See -help missing- if you are not familiar with extended missing values.) So you could do something like:
          Code:
          replace FLEXW7 = .a if inlist(quarter, 1, 2)
          label define FLEXW75 .a "Not Asked", add
          But only do this if you will have a need during your analyses to distinguish missingness due to not being asked from missingness due to not being answered.
          I really appreciate your reply and you suggestion to deal with this matter the last thing is I am trying to figure out the problem related to the time variable I generated when I reshape the data, so the time variable now takes values 1 through 5. what I'm thinking now is to get this format " April - June 2019" instead of "1", and "July - Seb 2019" instead of "2" ..... I try to find out the best way to format the variable, but I could not Thank for you your assist Ali
          Last edited by Ali Abutaleb; 07 Jul 2022, 11:36.

          Comment


          • #6
            Code:
            label define quarter    1   "Apr-Jun 2019"   ///
                                    2   "Jul-Sept 2019"    ///
                                    3   "Oct-Dec 2019"  ///
                                    4   "Jan-Mar 2020"  ///
                                    5   "Apr-Jun 2020"
            label values quarter quarter
            This will change what shows in displays, listings, regression outputs, etc. It, fortunately, does not change the internal storage of the variable, which remains 1 to 5 so it can actually be used for computations.

            Comment


            • #7
              Hi, can you guide me how to get longitudinal data from five quarter LFSUK data. Thanks.

              Comment


              • #8
                Hello,

                Following on from this thread.

                I am using the same data. I wish to combine multiple datasets from different years (although each dataset is fiver quarter). I understand that QRTR can be used as a time variable however must I create a new time variable indicating the year? In either case, how could I proceed to combine these datasets? Append or merge?

                Any guidance would be greatly appreciated.

                Comment

                Working...
                X