Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First difference of Longitudinal Variables

    Good day everyone. I am fairly new to Stata and i have a problem working with Longitudinal data. My data looks as follows
    Code:
    pid wave age bmi
    301012 1 66 27.5
    301012 5 75 27.2
    301013 1 26 21.5
    301013 5 35 17.6
    301015 1 46 37.4
    301015 5 55    .
    301016 1 28    .
    301016 5 37    .
    301017 1 31    .
    301017 5  .    .
    301018 1 52 41.3
    301018 5 62    .
    301019 1 25    .
    301019 5 35    .
    301025 1 46    .
    301025 5 55    .

    I wish to get the first differences wave5- wave1 for each variable so i can compare changes over time for each pid. I am guessing I have to use a loop but some direction will be of great help.
    Thank you

  • #2
    see
    Code:
    help tsset
    help tsvarlist

    Comment


    • #3
      Welcome to Statalist.

      Expanding on Rich's advice, here is sample code.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long pid byte(wave age) float bmi
      301012 1 66 27.5
      301012 5 75 27.2
      301013 1 26 21.5
      301013 5 35 17.6
      301015 1 46 37.4
      301015 5 55    .
      301016 1 28    .
      301016 5 37    .
      301017 1 31    .
      301017 5  .    .
      301018 1 52 41.3
      301018 5 62    .
      301019 1 25    .
      301019 5 35    .
      301025 1 46    .
      301025 5 55    .
      end
      
      xtset pid wave, delta(4)
      generate d_age = D.age
      generate d_bmi = D.bmi
      list, sepby(pid)
      Code:
      . list, sepby(pid)
      
           +------------------------------------------------+
           |    pid   wave   age    bmi   d_age       d_bmi |
           |------------------------------------------------|
        1. | 301012      1    66   27.5       .           . |
        2. | 301012      5    75   27.2       9   -.2999992 |
           |------------------------------------------------|
        3. | 301013      1    26   21.5       .           . |
        4. | 301013      5    35   17.6       9        -3.9 |
           |------------------------------------------------|
        5. | 301015      1    46   37.4       .           . |
        6. | 301015      5    55      .       9           . |
           |------------------------------------------------|
        7. | 301016      1    28      .       .           . |
        8. | 301016      5    37      .       9           . |
           |------------------------------------------------|
        9. | 301017      1    31      .       .           . |
       10. | 301017      5     .      .       .           . |
           |------------------------------------------------|
       11. | 301018      1    52   41.3       .           . |
       12. | 301018      5    62      .      10           . |
           |------------------------------------------------|
       13. | 301019      1    25      .       .           . |
       14. | 301019      5    35      .      10           . |
           |------------------------------------------------|
       15. | 301025      1    46      .       .           . |
       16. | 301025      5    55      .       9           . |
           +------------------------------------------------+
      As you see, a loop is not necessarily needed. But if you have a large number of variable where the difference is sensible (for most categorical variables it will not be sensible) the following is equivalent to the two generate commands above.
      Code:
      foreach var of varlist age-bmi {
          generate d_`var' = D.`var'
      }
      If you are going to be analyzing longitudinal data, you should prepare yourself by reading the introductory material in the Stata Longitudinal-Data/Panel-Data Reference Manual PDF included with your Stata installation and accessible from Stata's Help menu.

      Comment


      • #4
        Thanks much Rich and William for the quick response. Yes i do have a large number of variables some continues and others categorical. Codes were perfect. Thanks again
        Last edited by Noel Mfongeh; 24 Feb 2020, 00:37.

        Comment


        • #5
          In case anyone comes across this and is looking for seasonal differences: note that S2.x is x_t - x_(t-2) while D2.x is not.

          Comment


          • #6
            I will add that anyone who comes across this should start by reading the documentation recommended in post #2 before attempting any code. Don't just guess about what D. means.

            Actually, when working with panel data, I usually spell out what I want with lags and leads, so rather than
            Code:
            generate d_age = D.age
            I typically would use
            Code:
            generate d_age = age - L.age
            or, for this example, if I had not included delta(4) on xtset
            Code:
            generate d_age = age - L4.age

            Comment

            Working...
            X