How to combine observations?

Kris Bitney

Join Date: Feb 2017

Posts: 33
#1

How to combine observations?

06 Feb 2017, 09:47

Hi everyone,

I have some panel data and I need to combine observations that share an id and a date. The id is stored in a variable ("unique_id"). Each id may have multiple dates, and so will have multiple records (i.e. longitudinal data). However, right now I have multiple observations with the same id and same date, but with different values on some dummy variables. I want to combine all of the observations that share an id and a date in such a way that if at least observation has a value of "1" in a dummy variable, the combined variable will also have a value of "1" in that dummy.

Do you have any suggestions?

Thanks,

Kris
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#2

06 Feb 2017, 10:34

OK. So let's give that "dummy" variable a name. I'll call it x. Let's also name your date variable. I'll call it date. You want to thin your data to a single observation per combination of unique_id and date, with the combined value of x being 1 if any of the x's is 1, and zero otherwise.

Code:

by unique_id date, sort: egen new_x = max(x == 1) by unique_id date: keep if _n == 1 drop x rename new_x x

Note: You don't say what you want to do if x is missing for every observation for a given unique_id and date. The above code sets the combined value to zero, not to missing. Perhaps this situation never arises in your data. If it does, and if the proper solution is to set x to missing value in this circumstance, then it would be as follows:

Code:

by unique_id date, sort: egen new_x = max(x==1) by unique_id date,: egen all_missing = min(missing(x)) by unique_id date: keep if _n == 1 replace new_x = . if all_missing == 1 drop x all_missing rename new_x x
1 like
Comment
Kris Bitney

Join Date: Feb 2017

Posts: 33
#3

06 Feb 2017, 12:29

Worked perfectly. Thanks!!
Comment
Adithya Kolloju

Join Date: Jul 2022

Posts: 6
#4

04 Jul 2022, 02:25

Hello everyone,
I have panel data from 2001 to 2020 where I want to combine certain observations as shown below

Associate
Associate confirmed
Associate not confirmed
Researchers
Researchers A
Researchers B
Researchers tenured
Researchers non-tenured
Ordinary professors
Full professors
Extraordinary professors

Here each observation has a panel dataset from 2001 to 2020 with various universities. I want to combine observations with years and university names. something summing up like: Associate (Associate, the associate confirmed, and Associate not confirmed), likewise with other observations.
Is there any solution for this?

Thank you.
Comment
Denise Vella

Join Date: Aug 2022

Posts: 187
#5

07 Dec 2022, 03:28

Clyde Schechter Thanks

I've been using similar code (see below) - but I need to copy and paste this for 20 times for different variables(see bold) eg Dvt, MI, heartfailure etc.
Is there a shorter way to do this ?Rather than having to do this for 20 times.
Many thanks

bys hospitalid2 date: egen new_var = max(dvt == 1)
by hospitalid2 date: egen all_missing = min(missing(dvt))
by hospitalid2 date: keep if _n == 1
replace new_var = . if all_missing == 1
drop dvt all_missing

rename new_var dvt

Last edited by Denise Vella; 07 Dec 2022, 04:22.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35432
#6

07 Dec 2022, 04:13

Code:

by unique_id date, sort: egen new_x = max(x == 1) by unique_id date, sort: egen new_x = max(x == 1) by unique_id date: keep if _n == 1 For each set which have the same ID and date - keep the first observation....

This means generate a new variable new_x.

When variable x == 1, find the maximum when they have the same unique_id and date.

For each set which have the same ID and date - keep the first observation....

#5 That starts right and gets a little confused.

x == 1 is evaluated as 1 if x is 1 and as 0 otherwise.

Taking the maximum over possible 1s and 0s -- for each (identifier, date) combination -- yields 1 if any value is 1 and 0 otherwise (if all values are 0).

So for each (identifier, date) combination we have either all 1s or all 0s, and since they are all the same we need only keep one observation for our purpose. There are two systematic ways to keep one observation -- whenever several are identical -- to keep the first or to keep the last. It doesn't matter which we use -- it's like saying we have 1 1 1 1 1 1 here and 0 0 0 there, so which should we chose of the 1s and the 0s. You need not care.

Here a demonstration of a very simple case. Adding dates doesn't change the main idea.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float(id x) 1 1 1 0 1 0 2 0 2 0 2 0 end bysort id : egen any1 = max(x == 1) list, sepby(id) +---------------+ | id x any1 | |---------------| 1. | 1 1 1 | 2. | 1 0 1 | 3. | 1 0 1 | |---------------| 4. | 2 0 0 | 5. | 2 0 0 | 6. | 2 0 0 | +---------------+ bysort id : keep if _n == 1 l id any1 +-----------+ | id any1 | |-----------| 1. | 1 1 | 2. | 2 0 | +-----------+

For the correspondence any <=> max, all <=> min, see also https://www.stata.com/support/faqs/d...ble-recording/

.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 29956
#7

08 Dec 2022, 10:01

I've been using similar code (see below) - but I need to copy and paste this for 20 times for different variables(see bold) eg Dvt, MI, heartfailure etc.
Is there a shorter way to do this ?Rather than having to do this for 20 times.

Code:

foreach v of varlist Dvt MI heartfailure /*etc.*/ { by hospitalid2 date, sort: egen byte has_`v' = max(`v' == 1) by hospitalid2 date: gen byte all_missing = missing(`v'[1]) replace has_`v' = . if all_missing drop all_missing }
Comment

Announcement

How to combine observations?

Comment

Comment

Comment

Comment

Comment

Comment