Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • dropping duplicates of a variable only if another variable does not if is not the same company

    I have a large dataset. There are many duplicates of the variable "token" (see below). This is the case sometimes because it is the same company but had different transactions OR two or more companies have the same token name. I want to drop observations for the latter case and only keep observations if the duplicity comes from the same company

    Any help will be appreciated!
    Last edited by Olivia Johns; 18 Dec 2021, 10:28.

  • #2
    It will astonish me if anyone can answer this question. What, in the data, enables you to distinguish the two situations? You need to explain that. And you also need to show example data using the -dataex- command.

    If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      if a variable company enables you to distinguish the two situations then perhaps,
      Code:
      by token, sort: gen drop = _N>1 & company[1] != company[_N]

      Comment

      Working...
      X