Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimming Outliers by Group

    I want to trim my data in Stata by dropping the top/bottom 1% of Prices. I know how to do this in general using the -summarize- command. However, I would like to do this by groups. I have several industries, I want to do the trimming within each industry. In the end I would like to have one dummy variable (to_use) equal to one if observation is not within the indicated price outliers in any industry.

    An example of my data:

    ----------------------------------
    Industry | Product | Price |
    ----------------------------------
    Food | Apples | $ 10 |
    Food | Fish | $ 20 |
    Food | Bread | $ 5 |
    Cars | Car A | $ 100 |
    Cars | Car B | $ 200 |

  • #2
    S. IAAIA, please note:
    (as per FAQ#)6. Real names preferred.
    You are asked to post on Statalist using your full real name, including one or more given names and a family name or surname, such as "Ronald Fisher" or "Gertrude M. Cox". Giving full names is one of the ways in which we show respect for others and is a long tradition on Statalist.

    If you overlook this on first registration, it is easy to fix. Click on “Contact us” located at the bottom right-hand corner of every page
    .

    That said, you may be interested in -bysort-, that you can access via -help by-.
    Kind regards,
    Carlo
    (StataNow 18.5)

    Comment


    • #3
      Cross-posted at http://stackoverflow.com/questions/3...stata-by-group

      This is another point covered in the FAQ Advice. We ask that you tell us about cross-posting.



      What you want is to tag observations, not drop them; also I note that extreme values need not be outliers.

      As Carlo indicates, something along these lines should get you closer:

      Code:
       
      gen ismissing = missing(price) 
      bysort ismissing groupvar (price) : gen touse = inlist(_n, 1 + ceil(_N/100), _N - ceil(_N/100))
      If you don't have missing values, the code can be simplified.
      Last edited by Nick Cox; 02 May 2015, 10:20.

      Comment


      • #4
        There is an excellent messsage on the pitfalls of trimming outliers by - guess who - Nick on March,2012: http://www.stata.com/statalist/archi.../msg01342.html

        Hopefully you'll appreciate that.

        Best,

        Marcos
        Best regards,

        Marcos

        Comment


        • #5
          I'll repeat an observation I made today in another thread.
          Trimming prior to analysis is not justified in any circumstance that I can think of. If, to take the simplest case, you trim, then take ordinary means, the computed standard errors will be incorrect.
          Last edited by Steve Samuels; 03 May 2015, 22:21.
          Steve Samuels
          Statistical Consulting
          [email protected]

          Stata 14.2

          Comment


          • #6
            However, you can get standard errors for trimmed means if you wish. http://www.stata-journal.com/article...article=st0313

            To me, the main point of trimmed means is exploratory. It's easy to look at the structure of trimmed means as they vary from mean to median.

            Any rule such as trim 1% is just arbitrary and may seriously harm analysis.

            Comment

            Working...
            X