Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshape Wide to Long With Multiple Suffixes (year not being at the end)

    Dear All,

    I'm trying to reshape my employment data from a wide to a long format. I generally have the problem however that the variables are in a wide format both in terms of year, as well as in terms of job number. What I mean by this is that people might have held multiple jobs in the same year and each one is found in a new variables. The variable name structure therefore looks as follows: JobTenure_1987_03, where the first part is the variable, the second part (1987) the year, and the third part (here: 03) the number of the job held this year about which there is information (i.e. the variable provides information about tenure at the third job that individual has held in 1987. I am now looking get the data into a long format and wondering how to do so. I am not sure whether I should basically reshape the data twice (and have both variables for the number of the job entry in a specific year, as well as a variable for the year) or just using the year as a long format variable. Does anybody have any thoughts on the differences this would make in my analysis and what my stata syntax should look like to make this process efficient? I'm asking about the latter part as I have several dozen variables with the structure described above and I'm wondering whether there is a smart way to reshape this.

    Thanks for the help,
    J

  • #2
    Whether you would be better off with the data fully long (long in both year and job number) or partly long (long in year but wide in job number) depends on what analyses you have planned. I would say that most of the time, fully long would be better, but some things might go better in partly long. So your question cannot be answered without more information.

    That said, the easiest way to go fully long is not to do two -reshape-s. Instead do a single reshape that creates a combined year_jobnum variable (don't forget the -string- option for this), and then -split- that variable, parsing on the "_" character.

    If you want actual code, you will need to use the -dataex- command and post example data.

    Comment


    • #3
      Thanks for getting back to me Clyde Schechter. To be quite honest, I can't specifically say yet what my analysis is going to look like, but thinking about it, a fully long format is probably the better option. I've tried to post some example data below. It would be great if you could show me what kind of code you would use to tackle the problem. Your solution sounds a bit more elegant than me doing every step manually.

      To provide some further detail, I have included the identifier and 5 variables that should end up being the same one in long format. The first two are from 1980 (job number 1 and number 5 held by an individual in the same year) the latter three are from 1981 (job numbers 1,5,and 11 - I've included more here because I don't have the same number of jobs each year). Is this enough for you to work with?

      Thanks so much for the help.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int(CASEID_1979 EMPall_TENURE_1980_01) byte EMPall_TENURE_1980_05 int(EMPall_TENURE_1981_01 EMPall_TENURE_1981_05) byte EMPall_TENURE_1981_11
       4276  -4 -4 -4 -4 -4
       4853  -4 -4 -4 -4 -4
       9311  53 -4 -4 -4 -4
      11932  -4 -4 -4 -4 -4
      12408 155 -4 -4 -4 -4
       1889  -4 -4 -4 -4 -4
       8688  -4 -4 -4 -4 -4
       8511   5 -4 -4 15 -4
       6818  -4 -4 -4 -4 -4
       5122  -4 -4 -4 -4 -4
      end
      label values EMPall_TENURE_1980_01 vle4710100
      label values EMPall_TENURE_1980_05 vle4710500
      label values EMPall_TENURE_1981_01 vle4720100
      label values EMPall_TENURE_1981_05 vle4720500
      label values EMPall_TENURE_1981_11 vle4721100

      Comment


      • #4
        Code:
        reshape long EMPall_TENURE_, i( CASEID_1979 ) j(year_job) string
        split year_job, parse(_)
        And then rename your variables to something you can work with

        Comment


        • #5
          Jorrit's response in #4 is essentially what I would have suggested. The only thing I would have done differently is added the -destring- option to the -split- command so that you would end up with numeric variables without having to write an additional command to do that. I imagine that year and job number would be more useful as numeric variables than as strings in nearly all situations.

          Comment


          • #6
            Thanks for the help you two.

            Comment


            • #7
              Having added a few extra variables, I now have some which vary for both the job number and year and others which only vary per year. Is there an elegant way to reshape this into long format such that everything is reshaped into long even though some have the year and job stub, whereas others only have the year stub? (in the example below EmpAllHOURSWEEK_ has the year and job number stub, whereas AGEATINT_ only has the year stub).

              Thanks again!

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input byte(EmpAllHOURSWEEK_1992_01 EmpAllHOURSWEEK_1992_02 EmpAllHOURSWEEK_1993_01 EmpAllHOURSWEEK_1993_02 AGEATINT_1992 AGEATINT_1993)
              -4 -4 -4 -4 -5 -5
              -4 -4 -4 -4 35 36
              -4 -4 -4 -4 -5 -5
              -4 40 -4 -4 28 29
              -4 -4 -4 -4 28 29
              -4 -4 -4 -4 33 34
              -4 -4 -4 -4 31 32
              -4 -4 -4 -4 30 31
              -4 -4 -4 -4 28 29
              -4 -4 -4 -4 31 32
              -4 -4 -4 -4 33 34
              -4 -4 -4 -4 -5 -5
              -4 -4 -4 -4 -5 -5
              -4 -4 -4 -4 -5 -5
              -4 -4 -4 -4 28 28
              -4 -4 -4 -4 29 30
              -4 -4 -4 -4 -5 -5
              -4 -4 -4 -4 33 34
              -4 -4 -4 -4 29 30
              -4 -4 -4 -4 32 34
              -4 -4 -4 -4 -5 28
              -4 -4 -4 -4 31 32
              -4 -4 -4 -4 -5 -5
              -4 -4 -4 -4 -5 -5
              -4 -4 -4 -4 31 32
              45 -4 45 -4 33 34
              -4 40 -4 40 32 33
              -4 -4 -4 -4 -5 -5
              -4 -4 -4 -4 31 32
              -4 -4 -4 -4 28 29
              -4 -4 -4 -4 33 34
              -4 -4 -4 -4 -5 -5
              end
              label values EmpAllHOURSWEEK_1992_01 vlE5790100
              label values EmpAllHOURSWEEK_1992_02 vlE5790200
              label values EmpAllHOURSWEEK_1993_01 vlE5800100
              label values EmpAllHOURSWEEK_1993_02 vlE5800200
              label values AGEATINT_1992 vlR4007600
              label def vlR4007600 28 "28", modify
              label def vlR4007600 29 "29", modify
              label def vlR4007600 30 "30", modify
              label def vlR4007600 31 "31", modify
              label def vlR4007600 32 "32", modify
              label def vlR4007600 33 "33", modify
              label def vlR4007600 35 "35", modify
              label values AGEATINT_1993 vlR4418700

              Comment

              Working...
              X