Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Add a new variable which counts the number of rows corresponding to an identifying variable

    Hi there, i'd be very grateful for your advice.

    I'm trying to add a new variable to a large dataset which counts up the number of "events" for each person. The person is represented by an identity code in column 1. Each event is represented by a new row, which includes relevant data about the event. An example of these data is included below:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(identity event age event_code)
    567283 . 65 6785
         . . 65   34
         . . 65  345
         . . 65  345
         . . 65  432
    652361 . 89  567
         . . 89 4832
         . . 89 3857
    735273 . 90 3885
         . . 90  235
         . . 90 6921
         . . 90  684
    567426 . 54 2196
         . . 54   42
    end

    For the "event" variable I trying to add the following information throughout the dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(identity event age event_code)
    567283 1 65 6785
         . 2 65   34
         . 3 65  345
         . 4 65  345
         . 5 65  432
    652361 1 89  567
         . 2 89 4832
         . 3 89 3857
    735273 1 90 3885
         . 2 90  235
         . 3 90 6921
         . 4 90  684
    567426 1 54 2196
         . 2 54   42
    end

    I've been unable to find a solution to this online. Many thanks for your help.

  • #2
    I assume that the observations with missing value for identity are attributable to the nearest non-missing identity above them in your listing, not that the entity to which the event pertains is unknown. Stata, however, will not assume that, and this is a dangerous way to represent the data in Stata. So the first step is to spread the identity variable to the observations it applies to:

    Code:
    replace identity = identity[_n-1] if missing(identity)
    Then you can fill in the variable event with:

    Code:
    sort identity, stable
    by identity: replace event = _n
    Thank you for using -dataex- on your very first post here.

    Comment


    • #3
      That's really helpful Clyde, thank you

      Comment

      Working...
      X