excluding observations using restrict observation command

Jonathan David

Join Date: Jul 2014

Posts: 23
#1

excluding observations using restrict observation command

24 Nov 2014, 20:16

Hi I am having trouble excluding observations, I am trying to use the IF function, basically I trying to tell stata to exclude standardized score of one of my variables, i want it to remove those which are lower than -3.29 and exclude those higher than 3.29.

zforaging>= -3.29 | zforaging <=3.29

I am doing something wrong?
Tags: None
ben earnhart

Join Date: May 2014

Posts: 1027
#2

24 Nov 2014, 20:32

If you want to discard cases entirely, then it's:

Code:

keep if zforaging>= -3.29 & zforaging <=3.29

If you want to keep the cases, but want to restrict who goes into an analysis, it might be for example, something like:

Code:

reg happiness zforagaing gender income if zforaging>= -3.29 & zforaging <=3.29, vce(robust)

Note that your if needs to come before the comma spelling out options.

Last edited by ben earnhart; 24 Nov 2014, 20:33. Reason: typo
Comment
Jonathan David

Join Date: Jul 2014

Posts: 23
#3

24 Nov 2014, 20:47

thanks Ben!
Comment
Roberto Ferrer

Join Date: Apr 2014

Posts: 449
#4

24 Nov 2014, 20:49

See also -help inrange- which is more efficient.

You should:

1. Read the FAQ carefully.

2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#5

24 Nov 2014, 21:32

Oh, BTW Jonanthan -- I just realized you were the guy who wanted to force a variable to be normally distributed. Seems like you turned it into z-scores, which is one way to achieve pseudo-normality. But seriously, better off using the full distribution, in its original shape. Check out this link: http://blog.stata.com/2011/08/22/use...tell-a-friend/
and maybe re-think your approach.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#6

25 Nov 2014, 03:10

Calculating z-score if that means (value - mean) / SD does not change distribution shape. I'm confident Ben knows that, but I wanted to warn others who might misread his last posting.

I strongly agree with the larger point that dropping observations more than so many standard deviations away from the mean is usually a very poor way to deal with perceived distribution problems.
Comment
Jonathan David

Join Date: Jul 2014

Posts: 23
#7

25 Nov 2014, 10:19

oh, well, i am using a different data set, this was was only slightly off due to a few outliers.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#8

25 Nov 2014, 11:05

It's your analysis, manifestly, but your criterion looks as if it's based on

Code:

. di invnormal(.0005) -3.2905267

But if your data contain "outliers" then mean and SD and the normal assumption are all dubious or unreliable at best. Basing your decisions on what the data would be if they were normal when the evidence is that they are not is like devising strategies for dealing with crime using the premise that all citizens are honest.

Rejecting outliers because the data look problematic or even non-normal is outdated logic. Most modern statistical analysis is based on realising that (a) the marginal distribution of the response is NOT important (b) we can tailor analyses to whatever the conditional distribution is (most) like. Indeed, this was mainstream statistical thinking at least 40 years ago when generalized linear models were introduced or say 70 years ago when people started realising that transformations could be useful.
Comment

Announcement

excluding observations using restrict observation command

Comment

Comment

Comment

Comment

Comment

Comment

Comment