0 citations0 references

Clustering Based Undersampling for Handling Class Imbalance in C4.5 Classification Algorithm

Journal of Physics Conference Series2020Vol. 1641(1), pp. 012014–012014

Citations Over TimeTop 16% of 2020 papers

Wahyu Nugraha, Muhammad Sony Maulana, Agung Sasongko

Abstract

Abstract Machine Learning is very difficult to make an effective learning model if the distribution of classes in the training data set that is used is not balanced. The problem of class imbalance is mostly found during classifications in the real world where one class is very small in number (minority class) while the other classes are very numerous (majority in class). Building a learning algorithm model without considering the problem of class imbalance causes the learning model to be flooded by majority class instances so that it ignores minority class predictions. Random undersampling and oversampling techniques have been widely used in various studies to overcome class imbalances. In this study using the undersampling strategy with clustering techniques while the classification model uses C4.5. Clustering is used to group data and the undersampling process is performed on each data group. The goal is that sample samples that are useful are not eliminated. Statistical test results from experiments using 10 imbalance datasets from KEEL-repository dan Kaggle dataset with various sample sizes indicate that clustering-based undersampling produces satisfactory performance. Improved performance can be seen from the sensitivity and AUC values that increased significantly.

Citations Over TimeTop 16% of 2020 papers

Abstract

Related Papers