Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting the number of different firms a person visits over time

    I’m using generate _n and getting a result I know is wrong, so I know I’m misunderstanding something. For example, I tried
    sort ID v1
    by ID v1: generate v2 = _n

    v2 ends up with a count of the distinct number of observations in v1 for each ID, which is not what I’m trying to achieve. I am using Stata/MP 17.0.



    I have 41,353 observations and 1000 variables, but here is enough information to show the problem. My data looks like this where ID is the person, v1 is the name of the firm, v2 is the number of different firms each person visits, and v3 is the name of the firm each person has visited (in which case persons who visit more than one firm are given a value labeled "Multiple Firms")
    ID v1
    1 1
    2 1
    2 1
    3 2
    3 2
    4 2
    4 3
    5 1
    5 2
    5 3

    I would like to create a variable called v2 that counts the number of different values each ID has for v1. For example
    ID v1 v2
    1 1 1
    2 1 1
    2 1 1
    3 2 1
    3 2 1
    4 2 2
    4 3 2
    5 1 3
    5 2 3
    5 3 3

    Anyone have ideas for how I might achieve this?

    Ultimately, I will then use additional steps to create a third variable with the information from v2. My goal for the third variable is to be part of my wide shaped dataset where I only have one row for each ID. With this particular information, I want a table that shows the number of unique IDs that each v1 has. For example,

    Narrow Dataset
    ID v1 v2 v3
    1 1 1 1
    2 1 1 1
    2 1 1 1
    3 2 1 2
    3 2 1 2
    4 2 2 99999
    4 3 2 99999
    5 1 3 99999
    5 2 3 99999
    5 3 3 99999

    Wide Dataset
    ID v3
    1 1
    2 1
    3 2
    4 99999
    5 99999


    The table I would like to ultimately create
    v3 frequency %
    Firm 1 2 40%
    Firm 2 1 20%
    Multiple Firms 2 40%
    Last edited by Brooke Claypole; 01 Mar 2023, 15:25.

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(id v1)
    1 1
    2 1
    2 1
    3 2
    3 2
    4 2
    4 3
    5 1
    5 2
    5 3
    end
    
    by id (v1), sort: gen v2 = sum(v1 != v1[_n-1])
    by id (v1): replace v2 = v2[_N]
    In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      This worked great. Thank you, Clyde!

      Comment

      Working...
      X