Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to tag outliers?

    Hi, I would like to create a dummy that's equal to 1 if it's an outlier, and 0 if not. I can use -extremes- to display the outliers, but not tag the corresponding values.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double var
     7.000e+08
     2.100e+09
     2.200e+08
     3.500e+08
      75000000
     1.200e+08
     4.000e+08
      40500000
      20000000
     5.000e+08
     2.000e+08
      85000000
     1.250e+09
     2.000e+08
     3.000e+09
     1.500e+09
     4.000e+08
      10000000
     1.240e+09
      75000000
     1.200e+08
     1.100e+08
     3.160e+09
     1.200e+09
     1.000e+09
     2.480e+08
      40000000
     1.500e+08
     8.760e+08
     1.000e+08
     1.100e+09
     1.700e+09
     3.500e+08
     2.000e+08
     1.400e+09
     1.500e+08
     1.400e+09
     4.000e+08
     1.800e+08
     1.250e+08
     6.000e+08
     1.000e+08
     1.500e+08
     4.000e+08
     4.000e+08
     2.400e+09
     3.000e+08
     3.500e+08
      30000000
     6.880e+08
     1.200e+09
    1135395000
     1.000e+08
     2.000e+08
     1.600e+09
      60000000
      75000000
     1.150e+08
      60000000
     4.500e+08
       1000000
     6.000e+08
     1.800e+08
     1.120e+09
     2.000e+08
     1.800e+08
     1.000e+09
      24000000
     8.000e+08
     1.800e+09
     8.700e+08
      62000000
     1.200e+09
     3.600e+10
     2.000e+08
      16000000
     3.000e+09
     6.000e+08
     4.500e+08
     4.800e+08
     3.600e+08
     840132000
     2.000e+08
     9.000e+08
     3.000e+08
      90000000
     1.050e+09
       8000000
     5.000e+08
     1.000e+09
     5.100e+08
       6000000
     3.000e+08
     5.000e+08
     1.672e+09
     1.600e+08
     2.000e+08
     2.600e+08
     3.200e+09
     2.400e+08
    end
    Code:
    extremes var, n(10)
    Any thoughts on how to generate the dummy?

    Thanks!

  • #2
    What’s your definition of an outlier?

    Your anonymous variable has values from millions to billions, so logarithmic scale would seem to be natural for thinking about its distribution.
    Last edited by Nick Cox; 01 Mar 2020, 10:00.

    Comment


    • #3
      Just the extreme values listed by -extremes- command.

      Comment


      • #4
        Sounds familiar, somehow. But extremes (from SSC, as you are asked to explain) is for exploration, not for hard-and-fast stigmatization. What you want is programmable, but I am not volunteering.

        Comment

        Working...
        X