Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Boxplot problem of flat line

    Hello,
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(fundnr year mf_m_cf)
     444 2009     .9111743
    1606 2003    1.8259066
    3484 2010     9.627364
     743 2016     .4432602
    3691 2011     4.885561
    1935 2015   -1.8404514
     444 1993    .01043051
     984 1993 -.0044249827
    2524 1996     .9388815
     610 1992            .
    1417 1995    10.157868
     444 2015     .9147166
     527 1990            .
     767 2007      31.3497
     444 1991   -.10612977
     444 1999     .6728972
     402 2014     -.705271
    1219 1999   -4.4375644
     444 2005   -2.2521157
     444 1992      4.22313
     560 2004    -.3882031
    3580 2009   -.20625106
     527 1991     6.328609
     979 1996    -5.362143
    1060 2001    -3.597908
     417 2002   -1.1781628
     811 2008    -1.620225
     444 1991     .1711222
     767 2009   -1.3755553
    1669 2001   -.14327554
    2708 2003     .9402666
    3347 2008    .13227251
    2708 2004            .
     444 2004   -2.2446256
    1146 2004    .08893927
    3248 2006     9.885372
     444 1999     .3944024
    3396 2007     3.144457
    3705 2017    12.843843
    3460 1994            .
     335 2008     -.675634
     579 2003   -2.0698159
     940 2010     1.912574
    2708 2003     17.15932
    2708 2003   -.05014902
    3079 2012    1.4607174
    2708 2004            .
     453 2008     .4533027
     862 2014   -1.2433937
     444 2002    -2.389034
     650 2013    -.2421079
    1117 2017    .12475158
    1417 1993            .
    2793 2013    -.9765097
    1023 2017   -1.9454733
    3487 2013    -2.230105
     799 1994   .002836363
     993 2015    -1.622088
    1320 2015     .3555873
     439 1995     2.656444
    2994 2010    .04025678
    3553 2014     .1185691
    1064 1995    -.6680014
    3296 2003     .9756665
    1925 2012    .25422385
     444 2001     4.793575
     778 2015     .9230105
     995 1993     4.540443
     444 2006   -1.2819324
     612 1994     .9295382
    3250 2011    -3.901114
    3487 2013    1.1448343
    2413 2005     5.719396
     918 1996    1.2646472
     444 2003   -1.8748924
    1246 2018   -.25754893
     865 1994    -.3397882
    3386 2018     -.932922
    3487 2008     3.253638
     869 2009    -.3591767
     995 2003    -.2239342
    1608 2000            .
    1104 2007   -1.0762625
     630 2005   -1.1473552
    2735 2007     3.981054
     353 2000     1.657574
    3487 2009     9.839337
    1671 1996     9.620604
     650 2016    -1.475633
     444 1994     2.024926
     444 1994    -.3641732
     444 1995   -1.7175412
    1868 2002     -17.2737
    2809 2010   -2.0511134
    1082 2004     1.545442
    3386 2014    1.5415833
     428 2012    -.4344161
    3306 2015   -4.4798098
     829 2007    1.4708675
     350 2013   -.21686088
    end
    I am trying to find the outliers through boxplot, but there is no box when I run 'graph boxplot mf_m_cf', the outliers have a prominent effect on my average flows, which leads to a much higher result than intended. So I can see the outliers but there is a flat line of a box.

  • #2
    Code:
    quantile mf_m_cf, ms(oh)

    will give you a much richer appreciation of the distribution.

    As a step further in the same direction I used qplot (Stata Journal) and transplot (SSC) on your data example with the original data and the so-called neglog transformation sign(y) * ln(1 + |y|)) which behaves like y near 0, like ln y for y >> 0 and like --ln(-y)) for y << 0.


    Code:
    transplot qplot mf_m_cf, trans(@   sign(@)*log1p(abs(@)))

    Click image for larger version

Name:	neglog.png
Views:	1
Size:	32.4 KB
ID:	1657199


    The "outliers" don't look that extraordinary -- in this data example, viewed on a reasonable scale.
    Last edited by Nick Cox; 31 Mar 2022, 10:14.

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Code:
      quantile mf_m_cf, ms(oh)

      will give you a much richer appreciation of the distribution.

      As a step further in the same direction I used qplot (Stata Journal) and transplot (SSC) on your data example with the original data and the so-called neglog transformation sign(y) * ln(1 + |y|)) which behaves like y near 0, like ln y for y >> 0 and like --ln(-y)) for y << 0.


      Code:
      transplot qplot mf_m_cf, trans(@ sign(@)*log1p(abs(@)))

      [ATTACH=CONFIG]n1657199[/ATTACH]

      The "outliers" don't look that extraordinary -- in this data example, viewed on a reasonable scale.
      Thank you, the cashflows go up to 2.50e+07 but this occurs only once which is not shown in my example dataset

      Comment

      Working...
      X