Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Outliers with non-normally distributed panel data

    Hello,

    I have an unbalanced panel data set where my variables are not normally distributed. At the same time I have a variable with outliers. These outliers have values that definitely do not make sense and from my point of view represent input errors in the data that I cannot correct afterwards. I want to identify the outliers and then exclude them from my calculation. Due to the fact that my varaibles are not normally distributed, I cannot use many common methods to identify and handle outliers. Thus, I have tried the methods "median of absolute differences (mad) and "double mad". However, from my point of view, this excluded too many cases that are not outliers. Are there other methods that I can use here?

    I use Stata 14.2 and here is also some information about the variable called "sales":

    tabstat sales
    Click image for larger version

Name:	tabstat.PNG
Views:	1
Size:	3.9 KB
ID:	1596015


    qnorm sales
    Click image for larger version

Name:	qnorm.PNG
Views:	1
Size:	7.1 KB
ID:	1596016



    graph box sales
    Click image for larger version

Name:	graph box.PNG
Views:	1
Size:	5.6 KB
ID:	1596017


    Thanks for the support.

  • #2
    Two quite different kinds of issue are tangled together here. Whether a value like 1.18 billion is utterly wrong is a substantive issue on which we can't comment. But I don't see anything obviously pathological in the display. You'd learn more from quantile sales, ysc(log) (which will need some work on the axis labels).

    Comment


    • #3
      Thank you very much for your reply. I have implemented your hint and got out the following graph:

      quantile Sales, rlopts(connect(ascending)) yscale(log) ylabel(minmax) xmtick(minmax)
      Click image for larger version

Name:	Graph.png
Views:	1
Size:	12.7 KB
ID:	1596135


      If I understand the graph correctly, there are anomalies in the data in the first and last quantiles. In particular, I would like to deal with the supposed outliers in the first quantile and unified values in the last quantile. The only problem is the non-normal distribution of the data.

      Comment


      • #4
        I would never expect sales to be normally distributed.

        Identifying outliers that make no substantive sense is a substantive issue, and you need substantive knowledge to make an informed decision. But I will guess that this is panel data, say sales for several different firms over several years. If so, you could plot the time series for say the firms with the lowest medians and the highest medians against time as a further check.

        There isn't a general purpose method that will reliably identify which data points are bad and should be ignored.

        Comment


        • #5
          (The thread title does say that this is panel data!)

          Comment


          • #6
            There isn't a general purpose method that will reliably identify which data points are bad and should be ignored.
            As stupid as it sounds, this is a great insight for me, thank you very much! I always thought that in statistics all paths are pre-drawn. The fact that I can proceed independently based on the theory and data is wonderful.

            Comment


            • #7
              For more on what I think see if you wish e.g. https://stats.stackexchange.com/ques...iers-with-mean !!! (The answers there range wider than the question.)

              To be fair, there isn't one united view on all this. There are fields where the line is that a small fraction of a big dataset is just bad or least freakish and we can't possibly drill down to see what is genuine, so just throw out the outliers.

              As a geographer my experience is mostly the opposite: the outlier is a big flood, or a big glacier, or the Amazon, or Amazon, or the populations of China and India, and almost always very real.

              Comment

              Working...
              X