Counting the number of different firms a person visits over time

Brooke Claypole

Join Date: Nov 2019

Posts: 4
#1

Counting the number of different firms a person visits over time

01 Mar 2023, 15:18

I’m using generate _n and getting a result I know is wrong, so I know I’m misunderstanding something. For example, I tried
sort ID v1
by ID v1: generate v2 = _n

v2 ends up with a count of the distinct number of observations in v1 for each ID, which is not what I’m trying to achieve. I am using Stata/MP 17.0.

I have 41,353 observations and 1000 variables, but here is enough information to show the problem. My data looks like this where ID is the person, v1 is the name of the firm, v2 is the number of different firms each person visits, and v3 is the name of the firm each person has visited (in which case persons who visit more than one firm are given a value labeled "Multiple Firms")
ID v1
1 1
2 1
2 1
3 2
3 2
4 2
4 3
5 1
5 2
5 3

I would like to create a variable called v2 that counts the number of different values each ID has for v1. For example
ID v1 v2
1 1 1
2 1 1
2 1 1
3 2 1
3 2 1
4 2 2
4 3 2
5 1 3
5 2 3
5 3 3

Anyone have ideas for how I might achieve this?

Ultimately, I will then use additional steps to create a third variable with the information from v2. My goal for the third variable is to be part of my wide shaped dataset where I only have one row for each ID. With this particular information, I want a table that shows the number of unique IDs that each v1 has. For example,

Narrow Dataset
ID v1 v2 v3
1 1 1 1
2 1 1 1
2 1 1 1
3 2 1 2
3 2 1 2
4 2 2 99999
4 3 2 99999
5 1 3 99999
5 2 3 99999
5 3 3 99999

Wide Dataset
ID v3
1 1
2 1
3 2
4 99999
5 99999

The table I would like to ultimately create
v3 frequency %

Firm 1 2 40%

Firm 2 1 20%

Multiple Firms 2 40%

Last edited by Brooke Claypole; 01 Mar 2023, 15:25.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30097
#2

01 Mar 2023, 15:36

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte(id v1) 1 1 2 1 2 1 3 2 3 2 4 2 4 3 5 1 5 2 5 3 end by id (v1), sort: gen v2 = sum(v1 != v1[_n-1]) by id (v1): replace v2 = v2[_N]

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
1 like
Comment
Brooke Claypole

Join Date: Nov 2019

Posts: 4
#3

10 Mar 2023, 09:17

This worked great. Thank you, Clyde!
Comment

v3	frequency	%
Firm 1	2	40%
Firm 2	1	20%
Multiple Firms	2	40%

Announcement

Counting the number of different firms a person visits over time

Comment

Comment