Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Obtaining a mode for each unique value

    Hello! I have a data set of breathalyzer tests. There is a variable for arresting agency and zip code at location of test. My task is to make a unique list of arresting agencies matched with the most common zip code and the percent of the tests from that agency in the zip code. This is what I have so far:
    Code:
    #Find the mode for each arresting agency. (`modes' not working with bysort. find another way)
    bysort arrestingagency: gen mode_zip = modes zip
    #Find percent of cases with modal zip (count not working)
    by arrestingagency: gen perc_zip = (count if zip == mode_zip)/_N
    #Create unique list
    by arrestingagency: keep if _n=1
    I am getting errors, but after a search of the internet I cannot find any way to do what I am trying to. Thanks for your help!


  • #2
    why not use the "mode()" function of the -egen- command? see
    Code:
    help egen
    I note that if you supply a -dataex- example, example code can be supplied in return

    Comment


    • #3
      Using the "mode()" function fixed the first line of my code, thanks. The second line is still not working. I have had this recurring issue where I can't figure out how to do statements within a by command.
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long zip str30 arrestingagency
      45334 "BOTKINS POLICE DEPARTMENT"   
      45871 "BOTKINS POLICE DEPARTMENT"   
      45365 "BOTKINS POLICE DEPARTMENT"   
      45895 "BOTKINS POLICE DEPARTMENT"   
      45336 "BOTKINS POLICE DEPARTMENT"   
      45417 "BOTKINS POLICE DEPARTMENT"   
      45806 "BOTKINS POLICE DEPARTMENT"   
      46530 "BOTKINS POLICE DEPT"         
      45306 "BOTKINS POLICE DEPT"         
      43610 "BOWLING GREEN"               
      43402 "BOWLING GREEN"               
      44841 "BOWLING GREEN HIGH PATROL"   
      43545 "BOWLING GREEN HIGHWAY PATROL"
      43607 "BOWLING GREEN HIGHWAY PATROL"
      43623 "BOWLING GREEN HIGHWAY PATROL"
      43528 "BOWLING GREEN HIGHWAY PATROL"
      44841 "BOWLING GREEN HIGHWAY PATROL"
      43402 "BOWLING GREEN HIGHWAY PATROL"
      43617 "BOWLING GREEN HIGHWAY PATROL"
      end

      Comment


      • #4
        To learn how to do statement with the -by- you need to read the manual.

        If you do not want to read the manual you are better off using -egen-.

        I think in your second statement you want to do

        Code:
        egen perc_zip = mean(zip == mode_zip), by(arrestingagency)

        Comment


        • #5
          In #1 there is twice a guess that a command can be used as an expression to feed to generate. The presumption is that Stata will know to substitute the results of running that command.

          That is unfortunately just wishful thinking.
          generate does not work like that.

          The command modes is a community-contributed command that goes back to 1999. You're asked to explain any such you refer to (FAQ Advice #12).. modes has never supported by: The idea that it should is interesting and perfectly sensible, but essentially it has been superseded by egen, mode() as Rich Goldstein recommended. That is part of the official code, even though adapted from a community contribution some time ago, also formally published in 1999.

          Here are the details of
          modes -- although you are asked to report is that it is from the Stata Journal.

          SJ-9-4 sg113_2 . . . . . . . . . . . . . . . . . . . . . Tabulation of modes
          (help modes if installed) . . . . . . . . . . . . . . . . . N. J. Cox
          Q4/09 SJ 9(4):652
          update to allow the generate() option to record in an
          indicator variable of which observations contain values
          matching any of the modes displayed

          SJ-3-2 sg113_1 . . . . . . . . . . . . . . . . . . Software update for modes
          (help modes if installed) . . . . . . . . . . . . . . . . . N. J. Cox
          Q2/03 SJ 3(2):211
          provides new option for specifying the number of modes to
          be shown

          STB-50 sg113 . . . . . . . . . . . . . . . . . . . . . . Tabulation of modes
          (help modes if installed) . . . . . . . . . . . . . . . . . N. J. Cox
          7/99 pp.26--27; STB Reprints Vol 9, pp.180--181
          provides table of most frequent observations (modes)



          This code assumes that every observation is a case with a breathalyzer test. It produces the modes and their associated percents directly. It ignores any problems with ties.

          Code:
           
           contract arrestingagency  zip   bysort arrestingagency : egen pc = pc(_freq)  drop if missing(pc)  bysort foreign (pc) :  keep if _n == _N

          Comment


          • #6
            Sorry, the code above was posted late where I am. Here is a less mangled version

            Code:
            contract arrestingagency  zip
            bysort arrestingagency : egen pc = pc(_freq)
            drop if missing(pc)
            bysort foreign (pc) :  keep if _n == _N

            Comment

            Working...
            X