Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Winsorizing before or after long-transformation

    Dear users,

    I have a question regarding the handling of outliers in a log-log regression framework.

    I want to winsorize the independent variable at the 1st and 99th percentile, and I was wondering whether to do that before or after log-transforming it.

    I should add that I am well aware of the myriad problems of removing outliers in general and winsorizing in particular and am considering outlier-robust specifications. I still want to conduct an anaylsis using winsorization, bc a lot of the reference literature in the field (corporate finance, innovation) uses winsorization approaches when working with financial ratios where the numerator or denominator can get close to zero.

    Thanks in advance,
    Christian

  • #2
    The results are almost always going to be different, which is presumably why you're asking. .

    I'd expect Winsorizing on the original scale followed by logging usually to take you further away from the original data than the reverse. A case could be made for both choices. Usually logarithms pull in outliers but it can create outliers too. If there is a very low value at say 1e-6 and everything else varies from 1 to 1000 then spot the outlier after taking logs.


    To my mind this kind of arbitrariness is yet another reason to feel uncomfortable about the approach. In practice you're best to advised to do what is standard in your field, as flak from reviewers is going to be a bigger deal than any scepticism on Statalist.

    The spectrum of opinion on Statalist on Winsorizing seems to run from "this is a bad idea -- other approaches are better" to "this is what people do in some fields". I can't recall anyone saying it's a really good idea.

    I have often regretted posting a command in this territory, but users make their choices.
    Last edited by Nick Cox; 05 Apr 2021, 04:24.

    Comment


    • #3
      Hi all,

      One thought: shouldn't the order in which you apply these transformations not matter? Assuming your data are positive and winsorize a percentage of your data. Logging is a monotonic transformation after all.

      Comment


      • #4
        #3 In principle yes. In practice, percentiles are often interpolated between order statistics, which I think was behind what I said.

        Comment


        • #5
          This SAS blog has some nice discussion (I think!) of issues to consider: Notice especially the warning about possible loss of symmetry when using quantiles (as you propose to do). I think the point about considering more robust alternative models rather than Winsorizing, trimming or transforming is also a good one.

          HTH.
          --
          Bruce Weaver
          Email: [email protected]
          Version: Stata/MP 18.5 (Windows)

          Comment


          • #6
            I suspect the issues here are cultural as much as statistical. In some fields it seems to be assumed that outliers are something idiosyncratic that you do or even should want to ignore -- a stock rises or falls because of some rumour, or whatever, whereas you care about the general tendency of the market. Or whatever.

            In other fields (mine among them) an outlier is usually real and you should want to include it in the analysis -- unless you can demonstrate that it was wrong.

            To me taking logarithms was the first and remains the best robust method, not that it is always the answer.

            Comment

            Working...
            X