Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Winsorize by group

    Hi,

    I am trying to winsorize (top and bottom 0.5%) the variables "cogs", "sales", "sga" by group: "year" and "q". My sample look like that:
    year q cogs sales sga
    1989 1 3.564 5.194 2.153
    1989 2 3.072 4.384 2.067
    1989 3 2.947 4.776 2.304
    1989 4 3.09 4.913 2.392
    1990 1 2.167 3.506 2.049
    1990 2 1.919 3.153 1.749
    1990 3 1.573 2.678 1.633
    1989 1 81.661 108.635 12.481
    1989 2 75.96599999999999 104.566 13.819
    1989 3 84.405 112.278 13.622
    1989 4 92.26900000000001 119.396 13.99
    1990 1 86.875 116.092 13.817
    1990 2 93.25700000000001 115.808 13.68
    1990 3 91.708 117.82 14.768
    1990 4 90.749 116.822 16.28
    1991 1 83.364 107.339 13.804
    1991 2 79.283 101.948 14.256

    Thanks a lot for your help


  • #2
    You don't tell us what code you tried.

    winsor2 (Lian Yu-jun, SSC) offers support for by:

    Comment


    • #3
      I wouldn't be surprised if it's trimming you want, not winsorizing. Make sure you know the difference. For strong recommendation against such a pre-analysis procedure see posts here and here. Find Nick's trimmean command at SSC. I expect that when you try it, you will find that your miniscule trimming fraction has almost no effect. For good Stata commands to detect multivariate outliers and to block the effects of outliers in regression (not detectable by univariate trimming), type "findit mcd".
      Last edited by Steve Samuels; 17 Jun 2015, 01:42.
      Steve Samuels
      Statistical Consulting
      [email protected]

      Stata 14.2

      Comment


      • #4
        Thanks to Steve for the mention. Note that trimmean and trimplot were written up in the Stata Journal: http://www.stata-journal.com/article...article=st0313 That paper was then later reprinted in my Speaking Stata graphics (College Station, TX: Stata Press, 2014).

        I feel queasy about trimming and Winsorizing whenever the focus passes beyond summarizing the level of a distribution resistantly to producing variables that are supposedly an improvement on the original data. I'd rather develop a way to use all the data, including outliers and the far tails.

        I'd assert that trimming or Winsorizing time series invokes a tacit assumption of stationarity, so good luck with that.

        Comment


        • #5
          Steve,

          Thanks for mentioning that. I actually want to winsorize by group vs timming (trimming by group is not too much trouble in terms of code). I will look at "findit mcd" as it looks pretty relevant to what I'm doing.

          I am a bit suspicious about the outcome of winsor2, though. In the very basic case where you can use "winsor" or "winsor2", both functions sometimes output different results. Has anyone had similar problems? Thanks

          Comment


          • #6
            I am the author of winsor (SSC). I will happily look at bug reports of specific reproducible problems in which my program demonstrably produces incorrect or puzzling results. (It is a command, not a function.)

            Comment

            Working...
            X