Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Count of unique offences

    Please help get a total of unique offences (offence_id) for each month, in each given year by location

    input float (date month year) str44(ir_full offnc_id) float location
    19359 1 2013 "xxxxx" "bbbb" 0
    19359 1 2013 "eeee" "cccc" 0
    19359 1 2013 "eeee" "dddd" 0
    19359 2 2014 "eeee" "dddd" 1
    19360 2 2014 "eeee" "dddd" 2
    19391 2 2014 "xxxx" "eeee" 2.

    Thanks

  • #2
    First, it is unclear what you mean by "month" here. That's because you have both a date variable and separate month and year variables, but they sometimes refer to different months. In particular, the fourth and fifth observations have month = 2, but date is in January. In the code below, I assume that you want to group into months relying on the date variable and ignore the month and year variables. It may be that you want to do it the other way: use month and year but ignore date. In that case -gen mdate = ym(month, year)- will get you that. Finally, if date and month/year are supposed to refer to the same thing, then your data are in error and you should review the data management that created them to find the causative error(s) and fix it(them) as well as any other unrelated errors that you find along the way.

    Next, every observation has a "unique" offence, but some of them are not distinct. For example, in January 2013, there are five unique offences, but "dddd" appears three times, so there are only three distinct offenses. I assume you are interested in the number of distinct offences.

    Code:
    gen mdate = mofd(date)
    format mdate %tm
    
    by mdate (offnc_id), sort: gen wanted = sum(offnc_id != offnc_id[_n-1])
    by mdate: replace wanted = wanted[_N]

    Comment


    • #3
      Hi Clyde, Thanks. I will check the data for errors, but yes that's correct, am interested in distinct offences.

      Comment


      • #4
        Hi Clyde, Thanks. I will check the data for errors, but yes that's correct, am interested in distinct offences. Just added the "location" to the syntax so we count distinct offences per location/ per month

        Comment


        • #5
          See also

          Code:
          . search dm0042, entry
          
          Search of official help files, FAQs, Examples, and Stata Journals
          
          SJ-23-4 dm0042_5  . . . . . . . . . . . . . . . . Software update for distinct
                  (help distinct, distinctgen if installed)  N. J. Cox and G. M. Longton
                  Q4/23   SJ 23(4):1096
                  comments out (and thus removes) a call to clear Mata at the
                  close of work, which was frustrating some other projects
                  also using Mata
          
          SJ-23-2 dm0042_4  . . . . . . . . . . . . . . . . Software update for distinct
                  (help distinct, distinctgen if installed)  N. J. Cox and G. M. Longton
                  Q2/23   SJ 23(2):595--596
                  most important change is addition of distinctgen command
          
          SJ-20-4 dm0042_3  . . . . . . . . . . . . . . . . Software update for distinct
                  (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
                  Q4/20   SJ 20(4):1028--1030
                  sort() option has been added
          
          SJ-15-3 dm0042_2  . . . . . . . . . . . . . . . . Software update for distinct
                  (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
                  Q3/15   SJ 15(3):899
                  improved table format and display of large numbers of
                  observations
          
          SJ-12-2 dm0042_1  . . . . . . . . . . . . . . . . Software update for distinct
                  (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
                  Q2/12   SJ 12(2):352
                  options added to restrict output to variables with a minimum
                  or maximum of distinct values
          
          SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
                  (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
                  Q4/08   SJ 8(4):557--568
                  shows how to answer questions about distinct observations
                  from first principles; provides a convenience command
          The original 2008 paper discusses principles (including the difference between unique and distinct) and so more than just the distinct command.

          As Clyde Schechter exemplifies you can get a count just from first principles.

          Typing that Stata command using dm0042 will find any update at the time of anyone reading. not just this writing.

          Comment

          Working...
          X