Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to normalize event date

    I have following dataset, recording whether individual unemployment and want to normalized my "wave" variable so that have a new variable will be 0 when the unemployment happend and other date will be normalized base on the event date. And for people never unemploy will be null. Note that I have drop all individual who experienced more than once unemploy in the dataset.

    Wanted
    Code:
       +------------------------------+
       | prim_key wv unemploy timeline|
       |------------------------------|
    1. |        1   3      0       -2 |
    2. |        1   4      0       -1 |
    3. |        1   5      1        0 |
    4. |        1   6      0        1 |
       |------------------------------|
    6. |        2   1      1        0 |
    7. |        2   2      0        1 |
    8. |        2   3      0        2 |
    9. |        2   4      0        3 |
    10.|        2   5      0        4 |
       |------------------------------|
    11.|        3   7      0        . |
    12.|        3   8      0        . |
    13.|        3   9      0        . |
    14.|        3   10     0        . |
    15.|        3   11     0        . |
       +------------------------------+
    what I did seems like too complicated and I want to have a simplied version
    Code:
    *gen forward date
    gen prev = wv if unemploy==1
    bysort prim_key wv (prev) : replace prev =prev[1] if prev [1] ==1
    bysort prim_key (wv): replace prev =prev[_n-1] if mi(prev)
    gen forward = wv - prev
    
    *gen backward
    gen negwv = -wv
    gen next = wv if unemploy ==1
    bysort prim_key negwv (next): replace next =next[1] if next[1] ==1
    bysort prim_key (wv) : replace next = [_n-1] if mi(next)
    gen backward = wv - next
    
    *gen timeline
    gen timeline =cond(abs(forward) < abs (backward) , forward, backward)
    Last edited by Wenhan Yan; 15 Jan 2023, 02:09.

  • #2
    Using dataex would have helped us. This may help you.


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(prim_key wv unemply timeline)
    1  3 0 -2
    1  4 0 -1
    1  5 1  0
    1  6 0  1
    2  1 1  0
    2  2 0  1
    2  3 0  2
    2  4 0  3
    2  5 0  4
    3  7 0  .
    3  8 0  .
    3  9 0  .
    3 10 0  .
    3 11 0  .
    end
    
    bysort prim_key (wv) : egen first = min(cond(unemply == 1, wv, .))
    bysort prim_key (wv) : gen wanted = sum(unemply == 0) if wv >= first 
    
    list, sepby(prim_key)
    
    gsort prim_key -wv 
    
    by prim_key: replace wanted = -sum(unemply == 0) if missing(wanted) & first < . 
    
    sort prim_key wv
    
    list, sepby(prim_key) 
        +-----------------------------------------------------+
         | prim_key   wv   unemply   timeline   first   wanted |
         |-----------------------------------------------------|
      1. |        1    3         0         -2       5       -2 |
      2. |        1    4         0         -1       5       -1 |
      3. |        1    5         1          0       5        0 |
      4. |        1    6         0          1       5        1 |
         |-----------------------------------------------------|
      5. |        2    1         1          0       1        0 |
      6. |        2    2         0          1       1        1 |
      7. |        2    3         0          2       1        2 |
      8. |        2    4         0          3       1        3 |
      9. |        2    5         0          4       1        4 |
         |-----------------------------------------------------|
     10. |        3    7         0          .       .        . |
     11. |        3    8         0          .       .        . |
     12. |        3    9         0          .       .        . |
     13. |        3   10         0          .       .        . |
     14. |        3   11         0          .       .        . |
         +-----------------------------------------------------+

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Using dataex would have helped us. This may help you.


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input byte(prim_key wv unemply timeline)
      1 3 0 -2
      1 4 0 -1
      1 5 1 0
      1 6 0 1
      2 1 1 0
      2 2 0 1
      2 3 0 2
      2 4 0 3
      2 5 0 4
      3 7 0 .
      3 8 0 .
      3 9 0 .
      3 10 0 .
      3 11 0 .
      end
      
      bysort prim_key (wv) : egen first = min(cond(unemply == 1, wv, .))
      bysort prim_key (wv) : gen wanted = sum(unemply == 0) if wv >= first
      
      list, sepby(prim_key)
      
      gsort prim_key -wv
      
      by prim_key: replace wanted = -sum(unemply == 0) if missing(wanted) & first < .
      
      sort prim_key wv
      
      list, sepby(prim_key)
      +-----------------------------------------------------+
      | prim_key wv unemply timeline first wanted |
      |-----------------------------------------------------|
      1. | 1 3 0 -2 5 -2 |
      2. | 1 4 0 -1 5 -1 |
      3. | 1 5 1 0 5 0 |
      4. | 1 6 0 1 5 1 |
      |-----------------------------------------------------|
      5. | 2 1 1 0 1 0 |
      6. | 2 2 0 1 1 1 |
      7. | 2 3 0 2 1 2 |
      8. | 2 4 0 3 1 3 |
      9. | 2 5 0 4 1 4 |
      |-----------------------------------------------------|
      10. | 3 7 0 . . . |
      11. | 3 8 0 . . . |
      12. | 3 9 0 . . . |
      13. | 3 10 0 . . . |
      14. | 3 11 0 . . . |
      +-----------------------------------------------------+
      Hello Nick,

      Thank you so much, and sorry for the late respond. But waht if there is a missing value in the data, like for person 1, he did not participate in wv 2 and thus has a missing value in the dataset


      since I tried your code, but will generate incorrect timeline when individual has at least one missing observation for unemployment

      Comment


      • #4
        Missing values were not mentioned before, but now you have mentioned them please give a data example with some missing values and explain what would be a correct calculation as far as you are concerned.

        Comment


        • #5
          So the dataset example I created where I have someone did not participate the monthly survey and thus has missing value.

          I have two way of treating the missing value in my mind.

          First is treated as employed since I think people have higher incentive to participate back the survey if they gor unemployed since they will get compensation when they participate.

          Or I am thinking using other non missing to predict whether or not this person will unemployed during missing survey month
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte(prim_key wv unemply timeline)
          1  3 0 -2
          1  4 0 -1
          1  5 1  0
          1  6 0  1
          2  1 1  0
          2  2 0  1
          2  3 0  2
          2  4 0  3
          2  5 0  4
          3  7 0  .
          3  8 0  .
          3  9 0  .
          3 10 0  .
          3 11 0  .
          4  2 0 -3
          4  3 0 -2
          4  4 . -1
          4  5 1  0
          4  6 .  1
          end

          Comment


          • #6
            Your first solution can be coded using a variable

            Code:
            gen work = unempy != 0
            that treats missing as if it were 1 --

            and your second solution sounds a good idea too.

            Comment


            • #7
              Originally posted by Nick Cox View Post
              Your first solution can be coded using a variable

              Code:
              gen work = unempy != 0
              that treats missing as if it were 1 --

              and your second solution sounds a good idea too.
              Hello Nick,

              Thank you so much for your help. I will try and see which one is better.

              Comment

              Working...
              X