Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep only most recent measure with panel data

    Hello all. I have a long-format database that looks like this:
    id wave var1 var2
    1 1 85 .
    1 2 87 .
    1 3 . .
    1 4 . 5
    2 1 . .
    2 2 92 .
    2 3 76 .
    2 4 . 8
    I need a variable that is generated using var1 and var2 (let's call it var3). Now, I would only keep observations from wave 4 because that's where I have the measurement for var2, but that would be useless because I wouldn't have any measurement for var1 in wave 4.

    So my question is, is there a way to generate a new variable (let's say var1_new) that is equal to the most recent measurement of var1 for each id? So that I would have something like the following:
    id wave var1 var2 var1_new
    1 4 . 5 87
    2 4 . 8 76
    Any help is appreciated, thank you.
    Last edited by Luna Minerva; 04 Mar 2019, 09:11.

  • #2

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(id wave var1 var2)
    1 1 85 .
    1 2  . .
    1 3 87 .
    1 4  . 5
    2 1  . .
    2 2 92 .
    2 3 76 .
    2 4  . 8
    end
    
    
    bysort id (wave) : gen var1_new = cond(var1 < ., var1, var1[_n-1])
    
    
    list, sepby(id)
    
         +------------------------------------+
         | id   wave   var1   var2   var1_new |
         |------------------------------------|
      1. |  1      1     85      .         85 |
      2. |  1      2      .      .         85 |
      3. |  1      3     87      .         87 |
      4. |  1      4      .      5         87 |
         |------------------------------------|
      5. |  2      1      .      .          . |
      6. |  2      2     92      .         92 |
      7. |  2      3     76      .         76 |
      8. |  2      4      .      8         76 |
         +------------------------------------+
    
    by id : keep if _n == _N

    Comment


    • #3
      Hi Nick,

      I have a problem working with panel data.

      I was wondering if you could help me figuring out how to keep only the last 5 observations (hence dropping the rest) for each company (unique_id) for every date (file_date).
      Since I have the data arranged in a panel data format, I tried to run the following command:

      bysort FileDate Cusip SharesHeldatEndofQtr: keep in -5/-1
      'in' may not be combined with 'by' r(190);

      BUT IT DOES NOT WORK and I get this error ('in' may not be combined with 'by' r(190). I managed to sort the data in the correct order (first by date, then by company_id and lastly by ownership size) but I am having trouble in dropping the observations which are not in the top 5 (in this case the bottom 5 as sort works in ascending order) because the in range option apparently does not work with the by command. Do you know of any solution?

      Thanks in advance for the help!

      Comment


      • #4
        # 3: duplicate post https://www.statalist.org/forums/for...ownership-data.
        Please do not bump (as per FAQ). Thanks.
        Kind regards,
        Carlo
        (StataNow 18.5)

        Comment

        Working...
        X