Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • excluding observations using restrict observation command

    Hi I am having trouble excluding observations, I am trying to use the IF function, basically I trying to tell stata to exclude standardized score of one of my variables, i want it to remove those which are lower than -3.29 and exclude those higher than 3.29.

    zforaging>= -3.29 | zforaging <=3.29

    I am doing something wrong?

  • #2
    If you want to discard cases entirely, then it's:
    Code:
    keep if zforaging>= -3.29 & zforaging <=3.29
    If you want to keep the cases, but want to restrict who goes into an analysis, it might be for example, something like:

    Code:
    reg happiness zforagaing gender income if zforaging>= -3.29 & zforaging <=3.29, vce(robust)
    Note that your if needs to come before the comma spelling out options.
    Last edited by ben earnhart; 24 Nov 2014, 21:33. Reason: typo

    Comment


    • #3
      thanks Ben!

      Comment


      • #4
        See also -help inrange- which is more efficient.
        You should:

        1. Read the FAQ carefully.

        2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

        3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

        4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

        Comment


        • #5
          Oh, BTW Jonanthan -- I just realized you were the guy who wanted to force a variable to be normally distributed. Seems like you turned it into z-scores, which is one way to achieve pseudo-normality. But seriously, better off using the full distribution, in its original shape. Check out this link: http://blog.stata.com/2011/08/22/use...tell-a-friend/
          and maybe re-think your approach.

          Comment


          • #6
            Calculating z-score if that means (value - mean) / SD does not change distribution shape. I'm confident Ben knows that, but I wanted to warn others who might misread his last posting.

            I strongly agree with the larger point that dropping observations more than so many standard deviations away from the mean is usually a very poor way to deal with perceived distribution problems.

            Comment


            • #7
              oh, well, i am using a different data set, this was was only slightly off due to a few outliers.

              Comment


              • #8
                It's your analysis, manifestly, but your criterion looks as if it's based on

                Code:
                 
                . di invnormal(.0005)
                -3.2905267
                But if your data contain "outliers" then mean and SD and the normal assumption are all dubious or unreliable at best. Basing your decisions on what the data would be if they were normal when the evidence is that they are not is like devising strategies for dealing with crime using the premise that all citizens are honest.

                Rejecting outliers because the data look problematic or even non-normal is outdated logic. Most modern statistical analysis is based on realising that (a) the marginal distribution of the response is NOT important (b) we can tailor analyses to whatever the conditional distribution is (most) like. Indeed, this was mainstream statistical thinking at least 40 years ago when generalized linear models were introduced or say 70 years ago when people started realising that transformations could be useful.

                Comment

                Working...
                X