Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to detect duplicates across two variables for longitudinal data?

    I'm looking for assistance with duplicate data based on only two variables.
    I started with duplicates list then duplicates drop to remove duplicates, but upon trying to reshape the longitudinal data from long form to wide form, STATA gave me an error message stating:

    values of variable year not unique within id
    Your data are currently long. You are performing a reshape wide. You specified i(id) and j(year). There are observations within
    i(id) with the same value of j(year). In the long data, variables i() and j() together must uniquely identify the observations.

    I ran reshape error, which provided a long list of duplicates based on only id and year, but I can't figure out how to remove these values. I'm working with a large dataset, so it won't show the whole list of these duplicates, so I couldn't individually drop them. I would also like to look at these duplicates to see where they differ in regard to the other variables since they were not removed with the original duplicate command.

    Any suggestions?

  • #2
    Getting unique combinations of id and year is easy enough, but it may mask important details in your data. Why do you have duplicates? What do observations in your dataset represent? If the command

    Code:
    duplicates drop *, force
    did not eliminate all duplicates, there is at least one observation with the same id and year but different values in another variable. The best-case scenario is that missing values are causing this issue. This would be apparent if running the command

    Code:
    isid id year
    produces the error, 'values of id should never be missing.' Otherwise, you must identify the reason for the duplicates. That said, the quickest way to resolve this is by running

    Code:
    bysort id year: keep if _n==1
    but I do not recommend doing this before determining why you have duplicates.

    Comment

    Working...
    X