How to summarize multiple observations per ID?

Susanne W

Join Date: May 2015

Posts: 2
#1

How to summarize multiple observations per ID?

21 May 2015, 06:48

Company_ID Bank_type

1 1

1 3

2 5

3 9

3 1

3 2

4 3

4 3

5 1

6 1

6 1

7 4

7 7

7 1

7 2

7 3

Hi,

I have a data set looking like this one. There are different firms, identified by Company_ID. These different firms are customers of different type of banks, some of them have just one bank, some of them more. Now I would like to have a new variable (bank_code) that assigns to each firm exactly one number, that tells me the combination of bank types. So this new variable should be 13 for firm 1, 5 for firm 2, 129 for firm 3 and so on...unfortunately even after looking through the forum and google for some hours I still have no idea how to perform this in Stata. Can the collapse command help, or is there some way with egen? Any help would be greatly appreciated! Thanks a lot!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35405
#2

21 May 2015, 06:56

Something like

Code:

bysort Company_ID (Bank_type) : gen Types = string(Bank_type) if _n == 1 by Company_ID : replace Types = Types[_n-1] + cond(Bank_type != Bank_type[_n-1], string(Bank_type), "") if _n > 1 by Company_ID: replace Types = Types[_N]

By the way, you are asked to use a full real name here with given name and family name.

Last edited by Nick Cox; 21 May 2015, 07:01.
1 like
Comment
Susanne W

Join Date: May 2015

Posts: 2
#3

21 May 2015, 07:05

Exactly what I was looking for, thanks a lot!
Comment
Shivani Pandey

Join Date: Dec 2020

Posts: 7
#4

17 Dec 2020, 23:42

Hi,

I have a similar problem. My data looks like this:

familyid allHNames

1 Rick
1 Matt Jones
1 Shivani Pandey
2 AK
2 Balbir
3 Rick
4 Rohan Merkel
4 Blair Woldorf
4 Rishabh Jain
4 BP

and I want to create a dataset which looks like this:

familyid allHNames allNames

1 Rick Rick Matt Jones Shivani Pandey
1 Matt Jones Rick Matt Jones Shivani Pandey
1 Shivani Pandey Rick Matt Jones Shivani Pandey
2 AK AK Balbir
2 Balbir AK Balbir
3 Rick Rick
4 Rohan Merkel Rohan Merkel Blair Woldorf Rishabh Jain BP
4 Blair Woldorf Rohan Merkel Blair Woldorf Rishabh Jain BP
4 Rishabh Jain Rohan Merkel Blair Woldorf Rishabh Jain BP
4 BP Rohan Merkel Blair Woldorf Rishabh Jain BP

I used the above code like this:

sort familyid allHNames

bysort familyid(allHNames): gen allNames = string(allHNames) if _n == 1
by familyid : replace allNames = allNames[_n-1] + cond(allHNames != allHNames[_n-1], string(allHNames), "") if _n > 1
by familyid: replace allNames = allNames[_N]

But I get an error saying "type mismatch" just after the "bysort.." command line. I appreciate any help here!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35405

18 Dec 2020, 01:46

Code:

by familyid : replace allNames = allNames[_n-1] + cond(allHNames != allHNames[_n-1], allHNames, "") if _n > 1

Company_ID	Bank_type
1	1
1	3
2	5
3	9
3	1
3	2
4	3
4	3
5	1
6	1
6	1
7	4
7	7
7	1
7	2
7	3

Announcement

How to summarize multiple observations per ID?

Comment

Comment

Comment

Comment