Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting number of distinct string values across multiple variables

    Hello!

    I am using Stata SE 14.2 and am a relatively new user. I'm working with a medium-sized dataset using administrative educational data. I am trying to count all of the schools ever attended by any participant in the dataset. However, the dataset is longitudinal and the schools change over time, with the school attended at each timepoint in a distinct string variable. So for example, SchoolYear1, SchoolYear2, and SchoolYear3 are listed by id. I want to count the number of distinct string values across multiple variables. I tried using these instructions (http://www.stata.com/support/faqs/da...tinct-strings/) but ended up with a count of the schools by id, rather than just the number of schools. Any help would be much appreciated!

    Thanks in advance!

  • #2
    So, the same general approach:

    Code:
    reshape long SchoolYear, i(id) j(year)
    rename SchoolYear school
    drop if missing(school)
    sort school
    gen school_num = sum(school != school[_n-1])
    display "Total number of schools = " =school_num[_N]
    Note: No example data was provided, so this code is not tested. Beware of typos or other glitches.

    By the way, I don't know what else you plan for this data, but it will likely be easier to work with if you leave it in the long layout. Most things in Stata, like this problem, are easier to solve with long data than with wide.

    Comment

    Working...
    X