Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep one record and delete another?

    I have 150 instances in my database of 100,000 records where 2 records reported during the same 1 minute (where "obs_min=0" and both have the same values for min=47 and date=331.366) and I would like to keep the record for the first location (outdoors in the sample data below) and delete the one with the second location (Embassy below).. I am using Stata 17.

    IDnum Session TimeActivity min date PM25 obs_min
    5 1 Outdoors 44 331.3639 61 1
    5 1 Outdoors 45 331.3646 74 1
    5 1 Outdoors 46 331.3653 51.5 1
    5 1 Embassy 47 331.366 0 1
    5 1 Outdoors 47 331.366 4 0
    5 1 Embassy 48 331.3667 0 1
    5 1 Embassy 49 331.3674 0 1
    5 1 Embassy 50 331.368 0 1

  • #2
    Lesile,
    Does this help?

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(IDnum Session) str20 TimeActivity float(min date PM25 obs_min)
    5 1 "Outdoors" 44 331.3639 61 1
    5 1 "Outdoors" 45 331.3646 74 1
    5 1 "Outdoors" 46 331.3653 51.5 1
    5 1 "Embassy" 47 331.366 0 1
    5 1 "Outdoors" 47 331.366 4 0
    5 1 "Embassy" 48 331.3667 0 1
    5 1 "Embassy" 49 331.3674 0 1
    5 1 "Embassy" 50 331.368 0 1
    end

    encode TimeActivity, gen(timact)
    tab timact
    tab timact, nolabel
    drop if timact == 1 & obs_min ==1 & min == 47
    list

    Comment


    • #3
      Noah thanks for your input. The code you listed would help if I only had the rows of data shown on the screen as an example, but my database has >100,000 rows of data with 20 instances like the one listed above.

      I would like to make Stata choose from the 2 rows of data recorded during the same 1 minute and to keep the row of data from the location where the person was first and delete the record where they were second. The 2 bolded rows of data below both have min=47 and the same date value, with second row having obs_min==0. I want to tell Stata:
      1. When obs_min=0 (2nd bolded row below) look at the “ta” variable 2 records higher and if that record has the same microenvironment (ta is the numeric variable name for the string TimeActivity) as that higher row then keep this row but if not delete it (in the example below I want to keep the Outdoors row because the person was outdoors first then went into the Embassy)
      2. Then to look at the row above obs_min=0 (the person was in the Embassy) and look 1 record higher and if they have the same ta then keep the row but if not delete it
      Thank you for any advice you have on this! My brain is twisted in a knot trying to organize how to tell Stata to do this.
      IDnum Session TimeActivity min date PM25 obs_min FixedPM min0 ta
      5 1 Outdoors 44 331.3639 61 1 121 468524 1
      5 1 Outdoors 45 331.3646 74 1 121 468525 1
      5 1 Outdoors 46 331.3653 51.5 1 121 468526 1
      5 1 Embassy 47 331.366 3 1 121 468527 2
      5 1 Outdoors 47 331.366 44 0 121 468527 1
      5 1 Embassy 48 331.3667 0 1 121 468528 2
      5 1 Embassy 49 331.3674 0 1 121 468529 2

      Comment


      • #4
        you may try the following (provided people were in at most two locations),
        Code:
        by IDnum Session (min obs_min), sort: gen first_location = TimeActivity == TimeActivity[1]
        by IDnum Session min (first_location), sort: keep if _n==_N
        Last edited by Øyvind Snilsberg; 08 Feb 2022, 07:42.

        Comment


        • #5
          You might want to check out Chuck Huber's video on "How to identify and remove duplicate observations".

          https://www.youtube.com/watch?v=433G...1j82j&index=12

          Or, just read up on the duplicates command.
          -------------------------------------------
          Richard Williams, Notre Dame Dept of Sociology
          StataNow Version: 18.5 MP (2 processor)

          EMAIL: [email protected]
          WWW: https://www3.nd.edu/~rwilliam

          Comment


          • #6
            Ovind and Richard thank you for your input! I ended up using a combination of both of your advice. Ovind's code ended up delete all instances where study participants were in the same location, as opposed to just those instances in the same 1 minute. Chuck Huber's video was very helpful and I think the following code generated after watching Chuck's youtube video is a solution to the situation:

            duplicates report IDnum Session doy hr min
            duplicates drop IDnum Session doy hr min if _n!=_N, force

            tab obs_min
            replace obs_min = 1 if obs_min==0

            Comment

            Working...
            X