Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting whether variables match

    Hi everyone!

    I am working on a project and have been stuck trying to figure out part of it. I've been googling and searching thru the forums and haven't found anything that works yet, but it is possible that is because I don't know the way to correctly describe in search terms what it is that I am trying to do. I can explain it better here, if anyone has any suggestions on code to use or can point me in the right direction.

    The dataset I am using is a set of traffic citations issued. One variable is 'officercode' and another variable is the 'dor_code' (which refers to the municipality the ticket was issued in). I am trying to determine the percent of officers that issued tickets in multiple municipalities (i.e. outside of their home municipality). To do so, I need to count the number of times that 'officercode' is listed in the dataset with more than one 'dor_code'. Here's a small example:

    officercode dor_code
    10 1
    10 1
    10 2
    10 1
    20 2
    20 2
    20 2
    30 3
    30 4
    30 4

    Based on this example, 2 of the 3 officers have issued tickets outside of their home municipality (i.e. 'officercode' 10 and 30 are both associated with 2 different 'dor_code' ).

    I don't need to calculate the number of times that tickets were issued outside of their home municipality, just the number of officers. So my code needs to count the number of 'officercode' that are associated with more than one 'dor_code'.

    Thanks in advance for any help!!

  • #2
    Two contracts should do it.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte(officercode dor_code)
    10 1
    10 1
    10 2
    10 1
    20 2
    20 2
    20 2
    30 3
    30 4
    30 4
    end
    
    contract officercode dor_code
    contract officercode, freq(freq)
    Res.:

    Code:
    . l
    
         +-----------------+
         | office~e   freq |
         |-----------------|
      1. |       10      2 |
      2. |       20      1 |
      3. |       30      2 |
         +-----------------+

    Comment


    • #3
      I understood the question a bit differently than Andrew Musau did. The following code retains the original data as is, and generated a frequency table showing the number of officers who issued tickets in two or more municipalities. HTH.

      Code:
      clear *
      input officercode dor_code
      10 1
      10 1
      10 2
      10 1
      20 2
      20 2
      20 2
      30 3
      30 4
      30 4
      end
      
      * Sort by officercode and dor_code, then flag officers
      * who have different dor_codes on records 1 and _N
      bysort officercode (dor_code): ///
      generate byte twoplus = dor_code[1] != dor_code[_N]
      label variable twoplus "Officer has 2 or more DOR codes"
      label define noyes 0 "No" 1 "Yes"
      label values twoplus noyes
      * Flag the first record for each officer for use in -tabulate-
      by officercode: generate byte rec1 = _n == 1
      list, clean
      tabulate twoplus if rec1
      --
      Bruce Weaver
      Email: [email protected]
      Version: Stata/MP 18.5 (Windows)

      Comment


      • #4
        thank you so much!!

        Comment


        • #5
          I see this as about counting distinct values. See also https://www.stata-journal.com/articl...article=dm0042

          Code:
          clear *
          input officercode dor_code
          10 1
          10 1
          10 2
          10 1
          20 2
          20 2
          20 2
          30 3
          30 4
          30 4
          end
          
          egen tag = tag(officercode dor_code)
          
          egen n_distinct = total(tag), by(officercode)
          
          bysort officercode : gen n_total = _N 
          
          tabdisp officercode , c(n_total n_distinct)
          
          ----------------------------------
          officerco |
          de        |    n_total  n_distinct
          ----------+-----------------------
                 10 |          4           2
                 20 |          3           1
                 30 |          3           2
          ----------------------------------

          Comment

          Working...
          X