Trimming Outliers by Group

S. IAAIA

Join Date: May 2015

Posts: 12
#1

Trimming Outliers by Group

02 May 2015, 07:17

I want to trim my data in Stata by dropping the top/bottom 1% of Prices. I know how to do this in general using the -summarize- command. However, I would like to do this by groups. I have several industries, I want to do the trimming within each industry. In the end I would like to have one dummy variable (to_use) equal to one if observation is not within the indicated price outliers in any industry.

An example of my data:

----------------------------------
Industry | Product | Price |
----------------------------------
Food | Apples | $ 10 |
Food | Fish | $ 20 |
Food | Bread | $ 5 |
Cars | Car A | $ 100 |
Cars | Car B | $ 200 |
Tags: by group, data, foreach, outliers, trim
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17709
#2

02 May 2015, 08:18

S. IAAIA, please note:

(as per FAQ#)6. Real names preferred.
You are asked to post on Statalist using your full real name, including one or more given names and a family name or surname, such as "Ronald Fisher" or "Gertrude M. Cox". Giving full names is one of the ways in which we show respect for others and is a long tradition on Statalist.

If you overlook this on first registration, it is easy to fix. Click on “Contact us” located at the bottom right-hand corner of every page

.

That said, you may be interested in -bysort-, that you can access via -help by-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#3

02 May 2015, 09:11

Cross-posted at http://stackoverflow.com/questions/3...stata-by-group

This is another point covered in the FAQ Advice. We ask that you tell us about cross-posting.

What you want is to tag observations, not drop them; also I note that extreme values need not be outliers.

As Carlo indicates, something along these lines should get you closer:

Code:

gen ismissing = missing(price) bysort ismissing groupvar (price) : gen touse = inlist(_n, 1 + ceil(_N/100), _N - ceil(_N/100))

If you don't have missing values, the code can be simplified.

Last edited by Nick Cox; 02 May 2015, 09:20.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#4

02 May 2015, 13:27

There is an excellent messsage on the pitfalls of trimming outliers by - guess who - Nick on March,2012: http://www.stata.com/statalist/archi.../msg01342.html

Hopefully you'll appreciate that.

Best,

Marcos

Best regards,

Marcos
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#5

03 May 2015, 21:19

I'll repeat an observation I made today in another thread.

Trimming prior to analysis is not justified in any circumstance that I can think of. If, to take the simplest case, you trim, then take ordinary means, the computed standard errors will be incorrect.

Last edited by Steve Samuels; 03 May 2015, 21:21.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

04 May 2015, 01:18

However, you can get standard errors for trimmed means if you wish. http://www.stata-journal.com/article...article=st0313

To me, the main point of trimmed means is exploratory. It's easy to look at the structure of trimmed means as they vary from mean to median.

Any rule such as trim 1% is just arbitrary and may seriously harm analysis.
Comment

Announcement

Trimming Outliers by Group

Comment

Comment

Comment

Comment

Comment