Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem

Nir Ofek, Lior Rokach, Roni Stern, Asaf Shabtai

Neurocomputing 243, 88-102, 2017

Datasets that have imbalanced class distributions pose a challenge for learning and classification algorithms. Imbalanced datasets exist in many domains, such as: fraud detection, sentiment analysis, churn prediction, and intrusion detection in computer networks. To solve the imbalance problem, three main approaches are typically used: data resampling, method adaptation and cost-sensitive learning; of these, data resampling, either oversampling the minority class instances or undersampling the majority class instances, is the most used approach. However, in most cases, when implementing these approaches, there is a trade-off between the predictive performance and the complexity. In this paper we introduce a fast, novel clustering-based undersampling technique for addressing binary-class imbalance problems, which demonstrates high predictive performance, while its time complexity is bound by the size …