Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Requesting help creating a variable measuring duration of unemployment

    Hello, so for my dissertation I want to analyse the impact of duration of unemployment on a person's BMI. To do this I need to code a variable that measures the number of years since a person lost their job.

    I have panel data which includes a dummy variable in each wave showing if a person is employed (variable = 1) or unemployed (variable = 0). I want the new duration variable to show the number of waves since the employment variable changed from 1 to 0. So if employed currently equals 1 (i.e. they currently have a job) the duration equals zero. If employment equalled 1 in the previous wave and 0 in the current wave (they had a job in the last wave but have since lost it), duration should equal 1, etc.

    The dataset has wave and unique identifier variables, but the dataset is too large to manually input the new variable. If anyone has any tips on how I could go about creating this variable I would be very grateful. I hope my definition of what I'm trying to create is clear, please let me know if there's any other details that would be useful to you.

  • #2
    You do not provide example data, and your description leaves much to the imagination. I have used my imagination to create a toy data set that, hopefully, is similar to what you have. If not, we have both wasted our time.

    Code:
    //  CREATE TOY DATA SET
    clear*
    set obs 10
    gen id = _n
    expand 5
    by id, sort: gen wave = _n
    set seed 1234
    gen byte employed = runiformint(0, 1)
    
    //  IDENTIFY SPELLS OF UNEMPLOYMENT AND CALCULATE THEIR DURATION
    by id (wave), sort: gen spell_num = sum(employed != employed[_n-1])
    by id spell_num (wave), sort: gen duration = _N if employed == 0
    Note: This code assumes that every person has data in every wave of the survey. This is rarely true in real life surveys. But, if it is not true, then your question is ill-posed and cannot be answered anyway. For example, if a person is unemployed in waves 1 and 3, but does not appear in wave 2, then it is impossible to know whether we have unemployment of duration 3, or two separate unemployment spells of duration 1 each (with the person, unbeknownst to us, having been employed at wave 2).

    In the future, asking for help with code, show an example of your data, and please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Added:
    The dataset has wave and unique identifier variables, but the dataset is too large to manually input the new variable.
    This is backwards thinking. You don't need an excuse to create variables using code in Stata instead of manually inputting them. In fact, you should never manually create derived variables; it is error-prone and leaves no audit trail of the calculation. Only completely raw data should be input by hand. After that, everything should be automated, with a complete log of all the steps. This applies not just to dissertation work but to anything that you would ever ask anybody else to take seriously. Anything less is just playing around and is not acceptable scientific practice.


    Last edited by Clyde Schechter; 01 Jan 2022, 19:37.

    Comment


    • #3
      Thank you for your response to my question, I apologies that I did not include an example of my data, I am new here and I should have read the guidance more closely, my apologies.

      I have used the dataex command to produce the data below. If you look at id 4031, they are unemployed in waves 4 and 5, but the duration says 2 for both waves, ideally I need the duration to say 1 in wave 4, and 2 in wave 5. (So in each row it shows long they have been unemployed so far, rather than the total length of this spell of unemployment, I hope that is a clear description)

      Is it possible to adapt the code so the data shows what I need, as outlined above?

      Regarding your point on a balanced sample, yes the dataset is restricted to a balanced sample so each person is in every wave, sorry for not clarifying this at the start.

      Thank you for your assistance so far, and I am grateful for any further help you can offer.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input float(id employed wave spell_num duration)
      4006 1  1 1 .
      4006 1  2 1 .
      4006 1  3 1 .
      4006 1  4 1 .
      4006 .  5 2 .
      4006 .  6 2 .
      4006 .  7 2 .
      4006 .  8 2 .
      4006 .  9 2 .
      4006 . 10 2 .
      4006 . 11 2 .
      4031 .  1 0 .
      4031 1  2 1 .
      4031 1  3 1 .
      4031 0  4 2 2
      4031 0  5 2 2
      4031 1  6 3 .
      4031 1  7 3 .
      4031 0  8 4 1
      4031 .  9 5 .
      4031 . 10 5 .
      4031 . 11 5 .
      4039 1  1 1 .
      4039 1  2 1 .
      4039 1  3 1 .
      4039 1  4 1 .
      4039 0  5 2 1
      4039 .  6 3 .
      4039 1  7 4 .
      4039 1  8 4 .
      4039 0  9 5 1
      4039 1 10 6 .
      4039 . 11 7 .
      4041 .  1 0 .
      4041 1  2 1 .
      4041 1  3 1 .
      4041 0  4 2 2
      4041 0  5 2 2
      4041 1  6 3 .
      4041 .  7 4 .
      4041 1  8 5 .
      4041 .  9 6 .
      4041 . 10 6 .
      4041 1 11 7 .
      4042 .  1 0 .
      4042 1  2 1 .
      4042 1  3 1 .
      4042 0  4 2 2
      4042 0  5 2 2
      4042 1  6 3 .
      4042 1  7 3 .
      4042 0  8 4 1
      4042 .  9 5 .
      4042 . 10 5 .
      4042 . 11 5 .
      4180 1  1 1 .
      4180 1  2 1 .
      4180 1  3 1 .
      4180 1  4 1 .
      4180 1  5 1 .
      4180 .  6 2 .
      4180 1  7 3 .
      4180 1  8 3 .
      4180 1  9 3 .
      4180 0 10 4 1
      4180 1 11 5 .
      5002 .  1 0 .
      5002 .  2 0 .
      5002 .  3 0 .
      5002 .  4 0 .
      5002 .  5 0 .
      5002 .  6 0 .
      5002 .  7 0 .
      5002 .  8 0 .
      5002 .  9 0 .
      5002 . 10 0 .
      5002 . 11 0 .
      5003 1  1 1 .
      5003 1  2 1 .
      5003 1  3 1 .
      5003 1  4 1 .
      5003 1  5 1 .
      5003 1  6 1 .
      5003 1  7 1 .
      5003 1  8 1 .
      5003 1  9 1 .
      5003 1 10 1 .
      5003 1 11 1 .
      5004 0  1 1 2
      5004 0  2 1 2
      5004 1  3 2 .
      5004 1  4 2 .
      5004 .  5 3 .
      5004 .  6 3 .
      5004 1  7 4 .
      5004 1  8 4 .
      5004 1  9 4 .
      5004 1 10 4 .
      5004 1 11 4 .
      5005 1  1 1 .
      end

      Comment


      • #4
        Thanks for the data example and the explanation. I had interpreted your original request as being for a cumulative duration. To get the duration counting up, replace the code in #2 with
        Code:
        by id (wave), sort: gen spell_num = sum(employed != employed[_n-1])
        by id spell_num (wave), sort: gen duration_so_far = _n if employed == 0

        Comment


        • #5
          Thank you this code works perfectly, I apologies for the misunderstanding on exactly what I needed the new variable show, I shall try to be clearer in future posts.

          Thank you again, I can now continue my work.

          Comment

          Working...
          X