A Comparative Evaluation of the Outlier Detection Methods
Citations Over Time
Abstract
In data mining, in order to calculate descriptive statistics and other statistical model parameters correctly, outliers should be identified and excluded from the data set before starting data analysis. This paper studied and compared the performance of model-based, density-based, clustering-based, angle-based, and isolation-based outlier detection methods used in data mining. ROC and AUC curves were used to compare the performances of outlier detection methods. A data set with a standard normal distribution and fit a logistic regression was simulated. To compare the methods, the data was modified by randomly adding 30 outliers to the data set. The iForest algorithm was found to have higher predictive power than Mahalanobis, LOF, k-means, and ABOD. In addition, outliers were found in a real data set with the iForest algorithm and deleted from the data set. Then, the data sets with outliers and without outliers were compared. The results showed that the model without outliers has a higher predictive ability.
Related Papers
- Dual-regularized multi-view outlier detection(2015)
- → Why is this an anomaly? Explaining anomalies using sequential explanations(2021)26 cited
- → Identification of outliers in pollution concentration levels using anomaly detection(2016)6 cited
- → A New Neighborhood-Based Outlier Detection Technique(2019)4 cited
- → Minimal Rare-Pattern-Based Outlier Detection Method for Data Streams by Considering Anti-monotonic Constraints(2020)1 cited