Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • When to winsorize the data

    Hello everyone,

    for the first time I am working with variables that are aggregated at the country-year level. I noticed that the point in my code where I winsorize the data (upper and lower 1%) has a huge influence on the descriptive statistics and the regression results.
    Originally I learned to winsorize just before conducting the first (descriptive) analysis. As my final variables in this analysis will consist mainly of year- and country-level means, it seems wrong to me to include all the outliers in the means - and winsorizing the means in the end won't change anything, logically.
    Do I winsorize just in the beginning, for all basic variables? Or right before building the mean of a variable? Also, I was wondering, if trimming the outliers completely would not be a better way when creating further variables.

    These are three examples of how I construct the variables:
    cid = country identifier, Year = time variable

    In this first code, the variables of interest are BTD and BTC.
    Code:
    gen PTBI = PLBT/l.TOAS
    generate Taxation = TAXA/l.TOAS
    gen temp = PTBI - (Taxation/STAX)
    gen PermBTD = abs(temp)
    drop temp
    
    bys cid Year : egen BTD= mean(PermBTD)
    
    (... Ranking)
    
    Scaling:
    tabstat BTD_rank, by(BTD)
    egen rank = max(BTD_rank)
    gen BTD_scale = BTD_rank/rank
    
    *Average rank over a three year period
    bys cid : egen BTC = mean(BTD_scale)
    Second code: (Here, I left one possible point where I exemplary winsorized the data, but I am not sure if this is a statistical correct way to do it).
    Here, the variables of interest are DACR_ABS and DACR_ABS_c, but I also continue to calculate with variable TACR_t

    Code:
    sort fid Year
    gen TACR_t = (d.CUAS-d.CASH)-(d.CULI-d.LOAN)-DEPR
    gen TACR = ((d.CUAS-d.CASH)-(d.CULI-d.LOAN)-DEPR)/l.TOAS
    gen PPE = (TFAS+OFAS)/l.TOAS
    gen pseudo_intercept = 1/l.TOAS
    gen d_REV_REC = (d.OPRE-d.DEBT)/l.TOAS
    
    gen DACR = .
    
    sum ff48
    forvalues X = `r(min)'/`r(max)' {
    capture {
    reg TACR pseudo_intercept d_REV_REC PPE ROA if ff48==`X'
    predict res, res
    replace DACR = res if ff48==`X' & `e(N)'>=100
    drop res
    }
    }
    
    gen DACR_ABS = abs(DACR)
    winsor2 DACR_ABS, cuts(1, 99)
    sum DACR DACR_ABS
    
    bys cid Year ff48: egen DACR_ABS_c = median(DACR_ABS)
    This is example 3, where I continue to work with variable TACR_t, and the variable of interest ist EM1.
    Code:
    sort fid Year
    gen CFO = (PLAT - TACR_t)
    winsor2 CFO OPPL, cuts(1,99)
    gen CFO_scale = CFO/l.TOAS
    gen OPPL_scale = OPPL/l.TOAS
    bys fid : egen OPPL_sd = sd(OPPL_scale)
    bys fid : egen CFO_sd = sd(CFO_scale)
    bys fid: gen temp = OPPL_sd/CFO_sd
    bys cid Year: egen EM1 = median(temp)
    Thank you in advance for your help!

    This is the dataset. Next to the winsorized variables in the text, I also winsorized all variables in the beginning. Like this, the dataset best presents the relations should be in, but I highly doubt that is a good procedure for the final analysis.
    (I could not find a way to attach it looking more readable, the reader's view always deforms it. I am happy to adapt it if there is a special command I have not found.)

    Code:
     
    fid cid Year PLBT TOAS PTBI Taxation PermBTD BTD BTC TACR TACR_t DACR DACR_ABS DACR_ABS_c CFO CFO_scale OPPL_scale EM1
    2817 4 2017 4745.08385 58299.8695 0.0824614 0.02482222 0.00086259 6.588336 0.4074074 -0.00088067 -50.67625 0.04009051 0.04009051 0.10928206 3367.413 0.05851985 0.0810942 0.4120621
    2817 4 2018 2253.9957 84180.0186 0.03866211 0.01319829 0.00533219 0.05211727 0.4074074 -0.09503116 -5540.305 -0.01275468 0.01275468 0.03686274 7024.842 0.12049498 0.02562361 0.4120621
    2817 4 2019 2727.70514 89486.4554 0.03240324 0.01035516 0.00211395 0.05530543 0.4074074 -0.04133679 -3479.732 0.01791139 0.01791139 0.05070473 5335.74 0.06338488 0.02341303 0.4120621
    18086 11 2017 16.961 27413.101 0.00067381 -0.00022978 0.00163122 1.5529467 0.345679 -0.06771223 -1704.45 -0.0425133 0.0425133 0.03222278 1727.195 0.06861581 0.00066975 0.3348516
    18086 11 2018 40.109 29163.157 0.00146313 0.00060219 0.00104601 0.04311743 0.345679 -0.00983296 -269.552 0.02834461 0.02834461 0.02694934 293.153 0.0106939 0.0030164 0.3348516
    18086 11 2019 71.269 31958.754 0.0024438 0.00094057 0.00147524 0.10434106 0.345679 -0.03144594 -917.063 0.00040662 0.00040662 0.05091326 960.902 0.03294918 0.00423219 0.3348516
    12461 26 2017 2568.62122 20326.4696 0.11789087 0.02313979 0.0038975 2566.7444 0.04938272 -0.03096913 -674.7593 -0.01320734 0.01320734 0.06305698 2739.208 0.1257202 0.11487105 0.352464
    18141 11 2017 664.516 26967.073 0.0312878 0.00102609 0.02701242 1.5529467 0.345679 -0.04669478 -991.742 -0.00490222 0.00490222 0.03222278 1634.465 0.0769565 0.04399341 0.3348516
    12842 26 2017 484.04419 28529.2725 0.01753328 0.00106208 0.01194338 2566.7444 0.04938272 0.02884954 796.4542 0.05522585 0.05522585 0.06305698 -341.7311 -0.01237835 0.02316142 0.352464
    12461 26 2018 2079.37905 23069.4666 0.10229908 0.02313239 0.01945036 4039.437 0.04938272 -0.06566333 -1334.7036 -0.02738268 0.02738268 0.06723364 2943.883 0.14483002 0.10622934 0.352464
    14735 26 2017 5457.38167 67510.9793 0.0867826 0.01754733 0.00557178 2566.7444 0.04938272 0.02153486 1354.2338 0.04129442 0.04129442 0.06305698 2999.672 0.04770041 0.11926562 0.352464
    14735 26 2019 3117.17516 55688.2856 0.04376696 0.0088075 0.00258829 316.06085 0.04938272 -0.23890907 -17015.607 -0.2184093 0.2184093 0.06154155 19505.494 0.27386853 0.06764886 0.352464
    18141 11 2019 626.186 30695.16 0.0217053 0.0007554 0.01855778 0.10434106 0.345679 -0.10916464 -3149.34 -0.06459392 0.06459392 0.05091326 3753.733 0.13011454 0.0289237 0.3348516
    18141 11 2018 430.108 28849.452 0.01594938 0.00080813 0.01258215 0.04311743 0.345679 -0.05097769 -1374.719 -0.00362433 0.00362433 0.02694934 1783.034 0.06611893 0.02434246 0.3348516
    12842 26 2018 970.202841 29755.2765 0.03400728 -0.00915202 0.0821758 4039.437 0.04938272 -0.05048295 -1440.242 -0.00641199 0.00641199 0.06723364 2671.545 0.09364225 0.04103162 0.352464
    14735 26 2018 5227.51659 71222.107 0.0774321 0.01587822 0.00613747 4039.437 0.04938272 0.05890192 3976.5264 0.07906541 0.07906541 0.06723364 179.0363 0.00265196 0.11028625 0.352464
    12461 26 2019 -2299.15134 28575.4999 -0.0996621 -0.02270334 0.01982916 316.06085 0.04938272 0.01988966 458.8439 0.09009517 0.09009517 0.06154155 -2234.2415 -0.09684842 -0.13086022 0.352464
    12842 26 2019 2503.68624 31723.6477 0.0841426 0.0156359 0.00184841 316.06085 0.04938272 -0.1142029 -3398.139 -0.08337249 0.08337249 0.06154155 5436.575 0.1827096 0.09041428 0.352464
    13900 26 2018 1236.40673 92946.9044 0.01309846 0.00023881 0.01184159 4039.437 0.04938272 -0.04314405 -4072.5085 -0.01761709 0.01761709 0.06723364 5286.374 0.0560037 0.01891337 0.352464
    13900 26 2019 188.694897 83797.0865 0.00203014 -0.00112785 0.00796621 316.06085 0.04938272 -0.09391215 -8728.844 -0.02832236 0.02832236 0.06154155 9022.369 0.09707014 -0.00060548 0.352464
    8013 23 2017 1573.127 36502.566 0.05244766 0.00929099 0.01528371 0.067271 0.4074074 -0.2179192 -6536.318 -0.1386121 0.1386121 0.04456513 7830.769 0.26107588 0.06902349 0.3552493
    13900 26 2017 1587.62673 94393.2799 0.01705955 0.00158864 0.00869829 2566.7444 0.04938272 0.03076962 2863.537 0.06350838 0.06350838 0.06305698 -1423.755 -0.0152987 0.02263202 0.352464
    8013 23 2018 4466.796 39863.337 0.12236937 0.02506813 0.02209686 0.1677606 0.4074074 -0.12258103 -4474.522 -0.12470427 0.12470427 0.05864691 8026.267 0.21988226 0.11992294 0.3552493
    8013 23 2019 3622.78 40713.513 0.09088 0.01692011 0.02319957 0.04799062 0.4074074 0.00227954 90.87 0.02979955 0.02979955 0.03129168 2857.418 0.07168035 0.1044076 0.3552493
    11902 26 2018 25819.0465 172364.007 0.1548661 0.03451859 0.02681069 4039.437 0.04938272 0.0924698 15416.427 0.0939441 0.0939441 0.06723364 4647.7314 0.02787772 0.11226112 0.352464
    11902 26 2017 24302.9497 166718.504 0.1631501 0.03328214 0.01201905 2566.7444 0.04938272 0.075545 11253.236 0.07336272 0.07336272 0.06305698 8091.983 0.05432295 0.12743378 0.352464
    11902 26 2019 19740.463 185782.903 0.11452775 0.0255132 0.01975226 316.06085 0.04938272 0.01817813 3133.2556 0.03322161 0.03322161 0.06154155 12209.65 0.07083643 0.07340663 0.352464
    11150 8 2017 10238 166354 0.0655807 -0.01958838 0.1243517 17139.715 0.1234568 -0.09712195 -15162 -0.08349518 0.08349518 0.0546836 28458 0.18229103 0.05791318 0.316187
    11150 8 2018 16669 184658 0.10020198 -0.01543696 0.14698064 54.27909 0.1234568 -0.1066581 -17743 -0.09947996 0.09947996 0.04412192 36980 0.22229703 0.08493935 0.316187
    11150 8 2019 9385 189738 0.05082368 -0.0223765 0.11863127 0.07839143 0.1234568 -0.21499205 -39700 -0.19587246 0.19587246 0.07554021 53217 0.2881922 0.04996263 0.316187
    10568 8 2018 22686 254017 0.09363855 -0.00737188 0.11597759 54.27909 0.1234568 -0.10753203 -26052 -0.09525875 0.09525875 0.04412192 50524 0.20854247 0.09363855 0.316187
    10568 8 2017 39724 242272 0.16799957 0.01399855 0.1259997 17139.715 0.1234568 -0.06048137 -14301 -0.06397611 0.06397611 0.0546836 50715 0.21448237 0.1534343 0.316187
    10568 8 2019 32951.921 264412.112 0.1297233 -0.00303546 0.13892165 0.07839143 0.1234568 -0.02492523 -6331.432 -0.02063536 0.02063536 0.07554021 40054.41 0.15768397 0.12574823 0.316187
    7839 23 2018 1066.55 23770.879 0.04716855 0.00537687 0.02566106 0.1677606 0.4074074 -0.04857634 -1098.3821 -0.01868373 0.01868373 0.05864691 2043.353 0.090368 0.06172878 0.3552493
    7839 23 2017 323.52732 22611.4635 0.01658735 0.00573223 0.00634157 0.067271 0.4074074 0.08525405 1662.8342 0.1299022 0.1299022 0.04456513 -1451.111 -0.07439893 0.06080423 0.3552493
    14775 26 2019 2853.7693 28115.5208 0.12262114 0.02639048 0.01627614 316.06085 0.04938272 0.1648402 3836.336 0.1901969 0.1901969 0.06154155 -1596.754 -0.06860954 0.12872753 0.352464
    14775 26 2018 3337.32038 23273.0605 0.15273593 0.02755729 0.00769756 4039.437 0.04938272 0.0653158 1427.1674 0.05709546 0.05709546 0.06723364 1308.019 0.05986284 0.16197 0.352464
    14775 26 2017 3033.52764 21850.2638 0.16619253 0.0341322 0.01345061 2566.7444 0.04938272 0.1577564 2879.5425 0.15837006 0.15837006 0.06305698 -469.033 -0.02569608 0.17896764 0.352464
    7839 23 2019 -816.47 28662.921 -0.03434749 -0.00745576 0.00452444 0.04799062 0.4074074 -0.1785863 -4245.153 -0.11563886 0.11563886 0.03129168 3605.913 0.15169455 -0.03064561 0.3552493
    22006 19 2017 1151.99686 114638.249 0.00709595 0.00082249 0.00195539 0.04390508 0.6419753 -0.00936604 -1520.5367 0.02837502 0.02837502 0.04112589 2539.006 0.0156395 0.02977014 0.4174105
    22006 19 2019 9529.25346 109923.574 0.06893009 0.00342104 0.04754856 0.03877004 0.6419753 0.08016077 11081.84 0.10077485 0.10077485 0.02452792 -2025.5293 -0.01465172 0.02262637 0.4174105
    22006 19 2018 1895.08882 138245.186 0.01653103 0.00186648 0.00486554 0.06418661 0.6419753 -0.07036359 -8066.359 -0.05762959 0.05762959 0.05451224 9747.478 0.08502814 0.02967739 0.4174105
    21860 19 2017 17884.8777 101952.894 0.1761151 0.02144755 0.04206789 0.04390508 0.6419753 0.00658196 668.4124 -0.00370075 0.00370075 0.04112589 15038.42 0.14808556 0.19318733 0.4174105
    15868 12 2017 405.159369 61481.5362 0.00679402 0.00216759 0.00524815 0.02964047 0.7530864 -0.1118641 -6670.981 -0.06356306 0.06356306 0.06356306 6946.877 0.11649054 0.02669115 0.3491954
    15868 12 2018 829.631453 67128.6433 0.01349399 0.00356983 0.00633842 0.03924666 0.7530864 -0.08523504 -5240.381 -0.03262254 0.03262254 0.14398985 5850.534 0.0951592 0.02983445 0.3491954
    15868 12 2019 1793.03987 72803.4824 0.0267105 0.00534231 0.00296901 0.02611423 0.7530864 -0.03665513 -2460.609 0.01686677 0.01686677 0.05952562 3895.0266 0.05802332 0.04456156 0.3491954
    21860 19 2018 20213.8193 124859.174 0.19826627 0.02942579 0.01435509 0.06418661 0.6419753 -0.07651548 -7800.975 -0.08737852 0.08737852 0.05451224 25014.75 0.24535596 0.20482655 0.4174105
    21860 19 2019 19188.4224 156810.502 0.1536805 0.0004183 0.15106615 0.03877004 0.6419753 0.0177234 2212.9292 0.01063253 0.01063253 0.02452792 16923.266 0.13553883 0.16577235 0.4174105
    15352 26 2017 8898.01596 148165.823 0.05685988 0.02457833 0.07249975 2566.7444 0.04938272 -0.05788236 -9058.024 -0.01704199 0.01704199 0.06305698 14109.772 0.09016392 0.05547042 0.352464
    15352 26 2019 -12908.9227 137719.703 -0.08500778 -0.03222789 0.08461267 316.06085 0.04938272 -0.04811766 -7306.945 0.01679573 0.01679573 0.06154155 -707.9871 -0.00466223 -0.08096199 0.352464
    15352 26 2018 9480.03726 151855.783 0.06398262 0.01373194 0.00829076 4039.437 0.04938272 -0.0199172 -2951.0476 0.01835664 0.01835664 0.06723364 10396.48 0.07016787 0.06365335 0.352464
    17636 11 2017 20788.893 199735.28 0.12279417 0.03162026 0.0089569 1.5529467 0.345679 0.10901812 18456.627 0.13301103 0.13301103 0.03222278 -3021.002 -0.01784421 0.132197 0.3348516
    17636 11 2018 17602.427 240609.166 0.08812878 0.02222998 0.00449613 0.04311743 0.345679 -0.01779262 -3553.814 0.02097514 0.02097514 0.02694934 16716.13 0.08369143 0.1039849 0.3348516
    17636 11 2019 19052.031 298984.014 0.07918248 0.00422377 0.06158346 0.10434106 0.345679 -0.04982003 -11987.156 -0.0149379 0.0149379 0.05091326 30022.91 0.12477875 0.08621065 0.3348516
    11532 8 2019 7248 424925 0.01762235 0.00059082 0.015832 0.07839143 0.1234568 0.19893864 81822.63 0.24898374 0.24898374 0.07554021 -74817.63 -0.1819071 0.00343305 0.316187
    11532 8 2018 8195.561 411295.828 0.02114044 0.00029804 0.02023728 54.27909 0.1234568 0.0669193 25942.75 0.11734951 0.11734951 0.04412192 -17862.734 -0.0460769 0.00767275 0.316187
    11532 8 2017 7780.86 387672.198 0.02074138 0.00067225 0.01872444 17139.715 0.1234568 -0.1572441 -58988.07 -0.10642534 0.10642534 0.0546836 66516.75 0.17731322 0.00817811 0.316187
    17663 11 2017 6101.546 211228.124 0.03217998 -0.00070175 0.03510394 1.5529467 0.345679 -0.06836262 -12962.026 -0.02247254 0.02247254 0.03222278 19196.629 0.10124435 0.00412878 0.3348516
    17663 11 2018 12934.52 211275.583 0.06123484 0.00340757 0.04703664 0.04311743 0.345679 -0.01143031 -2414.402 0.02694934 0.02694934 0.02694934 14629.148 0.06925758 0.01549151 0.3348516
    17663 11 2019 7828.127 218236.25 0.03705173 0.00193938 0.02897098 0.10434106 0.345679 -0.0342764 -7241.766 0.01046947 0.01046947 0.05091326 14660.15 0.06938876 0.01033848 0.3348516
    8278 23 2019 7261.404 208579.005 0.0313216 0.00722673 0.00241466 0.04799062 0.4074074 -0.05143012 -11923.238 -0.01453135 0.01453135 0.03129168 17509.242 0.07552499 0.03894991 0.3552493
    8278 23 2018 13004.245 231833.76 0.05400398 0.01309377 0.0016289 0.1677606 0.4074074 0.03385944 8153.408 0.06116798 0.06116798 0.05864691 1697.8358 0.00705077 0.06421751 0.3552493
    8278 23 2017 16108.12 240801.611 0.1705609 0.03849348 0.01658697 0.067271 0.4074074 -0.2437021 -23015.72 -0.2324757 0.2324757 0.04456513 35488.44 0.3757696 0.1901406 0.3552493
    8032 23 2017 3304.38802 60685.9224 0.05850715 0.01465877 0.00012795 0.067271 0.4074074 0.01695952 957.8459 0.04612201 0.04612201 0.04456513 1518.6385 0.02688886 0.0694515 0.3552493
    8032 23 2019 3960.757 70241.9783 0.05780673 0.0144351 6.6321E-05 0.04799062 0.4074074 -0.05731983 -3927.396 -0.03154258 0.03154258 0.03129168 6899.1 0.10069146 0.06625017 0.3552493
    8032 23 2018 3567.17534 68517.2258 0.05878093 0.01488304 0.00075123 0.1677606 0.4074074 0.01611146 977.7385 0.0437501 0.0437501 0.05864691 1686.2457 0.02778644 0.06972042 0.3552493
    7893 23 2018 543.53 42135.957 0.01235401 0.00143683 0.0066067 0.1677606 0.4074074 -0.159011 -6995.887 -0.11456939 0.11456939 0.05864691 7476.202 0.1699282 0.01846981 0.3552493
    7893 23 2019 1233.669 42435.724 0.0292783 0.00507234 0.00898893 0.04799062 0.4074074 -0.12777655 -5383.987 -0.1148497 0.1148497 0.03129168 6403.928 0.1519825 0.02939646 0.3552493
    7893 23 2017 11.242 43996.242 0.00022848 0.00226717 0.0088402 0.067271 0.4074074 -0.07958093 -3915.5986 -0.01635631 0.01635631 0.04456513 3815.2896 0.07754225 0.00606692 0.3552493
    20198 17 2018 7301.43979 44964.4881 0.1512755 0.02992853 0.00624304 0.03684286 0.5432099 -0.08132406 -3925.1736 -0.0632997 0.0632997 0.0557246 9782.088 0.20267105 0.1510382 0.3894069
    20198 17 2017 5971.84558 48265.8377 0.1455135 0.02691003 0.00388176 21.34573 0.5432099 -0.13546728 -5559.551 -0.1075591 0.1075591 0.05372737 10427.014 0.25407076 0.14603199 0.3894069
    20198 17 2019 6458.71128 50239.4159 0.14364026 0.02857417 0.00675008 0.0394582 0.5432099 -0.0754589 -3392.971 -0.05265442 0.05265442 0.04746133 8566.859 0.190525 0.14515823 0.3894069
    12403 26 2018 1831.69843 21084.4899 0.08434094 0.01878781 0.01454229 4039.437 0.04938272 . . . . 0.06723364 . . 0.10011658 0.352464
    12403 26 2017 1607.68154 21717.7834 0.07482594 0.01656972 0.01238314 2566.7444 0.04938272 . . . . 0.06305698 . . 0.08987502 0.352464
    14014 26 2017 1244.91107 40066.984 0.03094948 0.00028009 0.02947534 2566.7444 0.04938272 . . . . 0.06305698 . . 0.07074968 0.352464
    8035 23 2019 5568.347 71455.611 0.08320484 0.02367002 0.01147524 0.04799062 0.4074074 0.01413187 945.752 0.03291081 0.03291081 0.03129168 3038.518 0.04540296 0.08470124 0.3552493
    12403 26 2019 1500.9017 26204.4589 0.07118511 0.01600973 0.0130766 316.06085 0.04938272 . . . . 0.06154155 . . 0.0896877 0.352464
    8035 23 2017 -273.036 61265.652 -0.00523697 0.00071536 0.00809839 0.067271 0.4074074 -0.21616824 -11270.207 -0.2061729 0.2061729 0.04456513 10959.875 0.21021593 -0.00509223 0.3552493
    8035 23 2018 5358.366 66923.356 0.08746117 0.02337178 0.00602593 0.1677606 0.4074074 0.07281438 4461.02 0.1318822 0.1318822 0.05864691 -534.541 -0.00872497 0.09060641 0.3552493
    14014 26 2018 -732.901126 52805.4707 -0.0182919 -0.00110692 0.01246599 4039.437 0.04938272 . . . . 0.06723364 . . 0.02692589 0.352464
    14014 26 2019 905.213088 54720.4232 0.01714241 . . 316.06085 0.04938272 0.03883414 2050.655 . . 0.06154155 . . 0.05764274 0.352464
    12344 26 2017 1558.11042 33744.413 0.05687201 0.00583935 0.02613857 2566.7444 0.04938272 0.11494425 3149.1035 0.14114691 0.14114691 0.06305698 -1750.9727 -0.0639116 0.05189622 0.352464
    12344 26 2018 2319.55999 34539.4901 0.06873908 0.01508186 0.0106391 4039.437 0.04938272 -0.10589373 -3573.322 -0.07338718 0.07338718 0.06723364 5383.954 0.15955096 0.07297777 0.352464
    10458 8 2019 -7867.585 137525.796 -0.05910061 -0.02460579 0.0154624 0.07839143 0.1234568 0.03212542 4276.597 0.08705228 0.08705228 0.07554021 -8868.612 -0.06662024 -0.03256869 0.316187
    10458 8 2018 971.501 133121.895 0.00753625 -0.01997768 0.06807468 54.27909 0.1234568 0.0194716 2510.093 0.04790367 0.04790367 0.04412192 1036.74 0.00804233 0.01482398 0.316187
    10458 8 2017 -2951.909 128910.446 -0.02373426 -0.02512494 0.0516481 17139.715 0.1234568 -0.0283303 -3523.535 0.00732802 0.00732802 0.0546836 3696.499 0.02972099 -0.00588877 0.316187
    3768 4 2017 2332.49638 44696.1628 0.0614609 0.0259248 0.02556429 6.588336 0.4074074 -0.09731121 -3693.048 -0.05046789 0.05046789 0.10928206 5041.675 0.13284731 0.0663941 0.4120621
    12344 26 2019 4867.12637 46453.1997 0.14091483 0.03006318 0.01731245 316.06085 0.04938272 -0.02570677 -887.8986 0.01008846 0.01008846 0.06154155 4716.658 0.13655841 0.1474753 0.352464
    3768 4 2018 4895.46861 56127.273 0.10952771 0.0217452 0.03704372 0.05211727 0.4074074 -0.00800858 -357.9529 0.02108388 0.02108388 0.03686274 4281.4946 0.0957911 0.0657604 0.4120621
    3768 4 2019 1402.30024 62611.8765 0.02498429 0.0052699 0.00741797 0.05530543 0.4074074 -0.00050883 -28.55952 0.05070473 0.05070473 0.05070473 1135.0748 0.02022323 0.02352101 0.4120621
    2296 24 2017 974.924424 27764.0022 0.03976626 0.00790245 0.0018256 0.1740682 0.7037037 -0.04352385 -1067.047 0.00173573 0.00173573 0.02452792 1848.232 0.07538765 0.04796743 0.4077041
    2296 24 2018 914.025062 28241.0108 0.03292123 0.00387152 0.0125448 0.02991877 0.7037037 -0.0698714 -1939.9098 -0.03288082 0.03288082 0.13556993 2746.446 0.09892111 0.04137136 0.4077041
    2296 24 2019 889.958285 30771.0082 0.03151298 0.00495491 0.00543453 0.02611423 0.7037037 -0.0601538 -1698.804 -0.02766701 0.02766701 0.06405019 2448.831 0.08671187 0.04287749 0.4077041
    2037 24 2017 -3119.57017 52444.5426 -0.06046965 . . 0.1740682 0.7037037 -0.05948853 -3068.955 0.0133148 0.0133148 0.02452792 -50.6151 -0.00098112 -0.05378319 0.4077041
    2037 24 2018 535.423068 55384.313 0.01020932 . . 0.02991877 0.7037037 0.09437573 4949.492 0.13556993 0.13556993 0.13556993 -4414.069 -0.0841664 0.00792254 0.4077041
    2037 24 2019 1071.6017 57849.8857 0.01934847 9.95E-06 0.01929612 0.02611423 0.7037037 -0.07419669 -4109.3325 -0.03215412 0.03215412 0.06405019 5180.383 0.0935352 0.01742513 0.4077041
    17238 11 2017 1280.977 132271.938 0.01118279 0.00010554 0.01074306 1.5529467 0.345679 0.08053026 9224.658 0.1405606 0.1405606 0.03222278 -7955.77 -0.06945301 0.01587699 0.3348516
    17238 11 2018 -1960.754 135983.841 -0.01482366 0.00021881 0.01573535 0.04311743 0.345679 -0.01130778 -1495.702 0.03083419 0.03083419 0.02694934 -493.994 -0.00373469 -0.01064285 0.3348516
    17238 11 2019 -14123.673 124176.531 -0.10386287 5.07E-06 0.103884 0.10434106 0.345679 -0.1843589 -25069.83 -0.11763403 0.11763403 0.05091326 10945.468 0.08049095 -0.0991344 0.3348516
    21123 17 2018 -2338.985 154811.584 -0.0136963 . . 0.03684286 0.5432099 -0.1258326 -21489.05 -0.07578793 0.07578793 0.0557246 19150.068 0.11213631 -0.00508051 0.3894069

  • #2
    What you should do depends on your goals and your context -- which could be anything from you're a student and your teachers have expectations or instructions you should follow to you're a researcher working towards a paper, in which case what people do in your field is key, perhaps regardless of whether people in other fields think that's crazy or bizarre.

    I posted a winsor command to SSC in 1998 because someone wanted to know how to do it and it was a straightforward programming challenge. And there's a winsor2 command building on it, which is in itself fine by me.

    But I almost never use even my own command. I can see some point in a Winsorized mean and much more point in a trimmed mean, but Winsorizing meaning that you produce new variables and work with them henceforth seems a poor way to deal with awkward distributions. There are many different and better answers in my view: Transformations? Generalized linear models? and so on. Nevertheless I understand that WInsorizing is common practice with some kinds of financial data. Despite my asking repeatedly on Statalist, no one has ever offered a textbook or respectable review reference explaining why it's a good idea (and more crucially better than competing ideas). (I am not an economist and so feel zero guilt at relative ignorance of its literature.)

    You didn't use dataex but your data example works well after copy and paste to Stata's Editor. I guess your real dataset is much bigger, which would be excellent, but your example data are the evidence we have.

    I pushed your example data through multqplot from the Stata Journal: that's just a convenience command for showing distributions as quantile plots. Here I plot against standard normal deviates. In doing that, normality of distribution is just a benchmark and I am not implying that departure from normality need be a problem. But as I understand it the principle behind Winsorizing is that awkward univariate distributions are a problem that should be fixed early. (In contrast, it seems to me that decisions on outliers and variable distributions need a look at the dataset as a whole and subject-matter knowledge too.)

    Click image for larger version

Name:	elena.png
Views:	1
Size:	89.0 KB
ID:	1631200


    There is almost the entire range of possibilities here, from variables that are very nicely behaved and appear not to need any treatment whatsoever -- to one variable BTD where some care may be needed, but you'd need to look at the full dataset, and the care need not mean Winsorization.

    I have no idea what any of these variables mean. I note that variables that may be a little awkward and can range from large negative to large positive are often tamed by logp1() or asinh(),

    Comment

    Working...
    X