Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping duplicate time measurements in panel data but selecting which duplicate to delete

    I am using Stata 17 and would like to get your advice on Stata code that almost solves my problem of choosing the correct of 2 panel data recordings in the same minute to keep and which to delete. I collapsed a data file with panel data recorded every 30 seconds but have 158 instances where the collapsed database now included recordings for the same 1 minute but in 2 locations (e.g. if the person went from their home to the outdoors during time 08:01 hours then I have a record for home at 8:01 and outdoors also at 8:01). I would like to keep the record for the first location (home in the example above).

    This code creates a flag for any recording during the same 1 minute (with the 6 variables ID, Session, year, doy, hr and min all being equal):
    Code:
    duplicates tag ID Session year doy hr min, generate(flag_dupe)
    tab flag_dupe
    And then the following code for the location variable named timeactivity (integer variable) to note change in location in each of the prior 3 observations:
    Code:
    sort IDnum Session year doy hr min
    bysort IDnum Session: gen int flag_ta = timeactivity[_n] - timeactivity[_n-1]
    bysort IDnum Session: gen int flag_ta2 = timeactivity[_n] - timeactivity[_n-2]
    bysort IDnum Session: gen int flag_ta3 = timeactivity[_n] - timeactivity[_n-3]
    Including some data below - see the 2 entries at minute 30 as well as at minute 35.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str7 ID double Session float(year doy hr min timeactivity) byte flag_dupe int(flag_ta flag_ta2 flag_ta3)
    "Kat 005" 2 2020 32 14  7 6 0  0  0  0
    "Kat 005" 2 2020 32 14  8 6 0  0  0  0
    "Kat 005" 2 2020 32 14 11 6 0  0  0  0
    "Kat 005" 2 2020 32 14 12 6 0  0  0  0
    "Kat 005" 2 2020 32 14 30 1 1 -5 -5 -5
    "Kat 005" 2 2020 32 14 30 6 1  5  0  0
    "Kat 005" 2 2020 32 14 32 1 0 -5  0 -5
    "Kat 005" 2 2020 32 14 35 1 1  0 -5  0
    "Kat 005" 2 2020 32 14 35 4 1  3  3 -2
    "Kat 005" 2 2020 32 14 36 4 0  0  3  3
    end
    label values timeactivity tactive
    label def tactive 1 "Car", modify
    label def tactive 4 "Home", modify
    label def tactive 6 "Outdoors", modify
    Thank you in advance for your help! Best, Leslie

  • #2
    Code:
    * Generate an internal sequence, assuming these data are already in chronological order:
    gen seq = _n
    
    * Generate a mini-sequence of activities within a minute:
    bysort ID year doy hr min (seq): gen mseq = _n
    
    * Keep only the first instance:
    keep if mseq == 1

    Comment


    • #3
      Leslie:
      another approach can be:
      Code:
      . . bysort ID year: gen check=1 if min[_n]== min[_n-1]| min[_n]== min[_n+1]
      
      . bysort ID year: gen wanted=sum(check) if check!=.
      
      . drop if wanted>1 & wanted!=.
      
      . drop check wanted
      
      .list
      .
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Ken - thank you for suggesting the code. Unfortunately the code only selects the first of the 2 observations in the database during the same 1 minute time and the priority of the ordering is (unfortunately) the alphabetical order of the name of the timeactivity locations with timeactivity coded as 1 "Car" 2 "cooking" 3 "Embassy" 4 "Home" 5 "Indoor Other" 6 "Outdoors". This is very unfortunately as I have no reason to want to keep the data reported while the study participant is in the Car more than when they are cooking. In fact, another option for choosing which of the 2 observations to keep is for the study locations that are of greatest interest in the study with home>cooking>embassy>indoor other>car>outdoors as this is an indoor air quality analysis.

        Maybe the solution to this analysis is as easy as recoding the timeactivity integer numeric assignments in the priority order that I mentioned with 1 home 2 cooking 3 embassy etc.

        Thanks for any advice that you have on this analysis! Best, Leslie

        Comment

        Working...
        X