How to combine multiple observations

Yoonseo Jo

Join Date: Apr 2022

Posts: 2
#1

How to combine multiple observations

16 Apr 2022, 05:37

Hello,

I have a dataset that has a unique household and individual id.
However, there are multiple entries in the dataset for each household/individual id.
How can I combine these so there would be one observation for each of unique household/individual id?

Thank you so much!

Yoonseo
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29813
#2

16 Apr 2022, 12:22

Well, in the example you show, the replicate observations having the same household and personal id either have missing values on all other variables, or, have non-missing values which agree on all of these observations. If this pattern is true throughout the data set, you can reduce this to one observation per person by running:

Code:

ds hhid14_9 pid14, not collapse (firstnm) `r(varlist)', by(hhid14_9 pid14)

However, before you do that, you should really verify that there really aren't any replicate observations where there are non-missing values that disagree. Because if there are, then you have to have someway of either deciding which one is right, or selecting some way of combining the disagreeing values into a single resolved value. To check for disagreement:

Code:

ds, has(type string) local string_vars `r(varlist)' foreach s of local string_vars { by hhid14_9 pid14 (`s'), sort: assert `s' == `s'[_N] | missing(`s') } ds `string_vars', not local numeric_vars `r(varlist)' foreach n of local numeric_vars { by hhid14_9 pid14 (`n'), sort: assert `n' == `n'[1] | missing(`n') }

If this code runs without producing any "assertion is false" messages, then your data are suitable for the -collapse- approach shown. If not, you have data contradictions to reconcile. As for the best way to reconcile those contradictions, that depends on the substance and science of what you are doing.

I would add yet another caution here. If the data does pass this test, and you can safely run the -collapse-, you need to ask why the data set contains all these extra, uninformative observations. It often means that something went wrong in the data management that created that data set. And if there are errors that led to this obvious problem, there may also be other errors that need to be fixed in order to produce an accurate, usable data set. So before proceeding with further analysis, this would be a good time to review in detail how this data set was created and fix any errors that were made.

The code shown here is not tested because it is not possible to import data from a screenshot into Stata. In the future, when showing data examples, please use the -dataex- command to do so. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment

Announcement

How to combine multiple observations

Comment