After reading previous posts and other resources, I decided that the best strategy to delete uni-variate outliers of my variable of interest is by using the IQR. I have a variable that counts the number of use of force staff have been involved before and after a program. Some staff participated in this program and others not (control group). I am conducting a paired sample ttest to compared means. As you may know this test is extremely susceptible to outliers. My strategy was to drop(or at least not include outliers) outliers that have a value greater than 1.5 + IQR of the Q3.
-> group = Control
1 UOFCount
Percentiles Smallest
1% 1 1
5% 1 1
10% 1 1 Obs 345
25% 1 1 Sum of Wgt. 345
50% 1 Mean 1.782609
Largest Std. Dev. 1.469353
75% 2 7
90% 3 7 Variance 2.158999
95% 5 10 Skewness 2.862175
99% 7 12 Kurtosis 13.95002
-> group = Experimental
1 UOFCount
Percentiles Smallest
1% 1 1
5% 1 1
10% 1 1 Obs 345
25% 1 1 Sum of Wgt. 345
50% 2 Mean 3.730435
Largest Std. Dev. 3.36532
75% 5 15
90% 8 17 Variance 11.32538
95% 11 22 Skewness 2.067698
99% 15 23 Kurtosis 8.969509
gen iqr_value=5+(1.5*(5-1)) if group==1
replace iqr_value=2+(1.5*(2-1)) if group==0
gen iqr_outlier=1 if preUOF> iiqr_value
Does this look appropriate?? Is there a more efficient way to do this, perhaps a egen option? 10% and 3% of my control and experimental group are outliers based in this criteria respectively.
PS: how can I copy a paste Stata output to Stata forum in a better way? is there something like dataex for output?
Thank you,
Marvin
-> group = Control
1 UOFCount
Percentiles Smallest
1% 1 1
5% 1 1
10% 1 1 Obs 345
25% 1 1 Sum of Wgt. 345
50% 1 Mean 1.782609
Largest Std. Dev. 1.469353
75% 2 7
90% 3 7 Variance 2.158999
95% 5 10 Skewness 2.862175
99% 7 12 Kurtosis 13.95002
-> group = Experimental
1 UOFCount
Percentiles Smallest
1% 1 1
5% 1 1
10% 1 1 Obs 345
25% 1 1 Sum of Wgt. 345
50% 2 Mean 3.730435
Largest Std. Dev. 3.36532
75% 5 15
90% 8 17 Variance 11.32538
95% 11 22 Skewness 2.067698
99% 15 23 Kurtosis 8.969509
gen iqr_value=5+(1.5*(5-1)) if group==1
replace iqr_value=2+(1.5*(2-1)) if group==0
gen iqr_outlier=1 if preUOF> iiqr_value
Does this look appropriate?? Is there a more efficient way to do this, perhaps a egen option? 10% and 3% of my control and experimental group are outliers based in this criteria respectively.
PS: how can I copy a paste Stata output to Stata forum in a better way? is there something like dataex for output?
Thank you,
Marvin
Comment