Title :
RAMOBoost: Ranked Minority Oversampling in Boosting
Author :
Chen, Sheng ; He, Haibo ; Garcia, Edwardo A.
Author_Institution :
Dept. of Electr. & Comput. Eng., Stevens Inst. of Technol., Hoboken, NJ, USA
Abstract :
In recent years, learning from imbalanced data has attracted growing attention from both academia and industry due to the explosive growth of applications that use and produce imbalanced data. However, because of the complex characteristics of imbalanced data, many real-world solutions struggle to provide robust efficiency in learning-based applications. In an effort to address this problem, this paper presents Ranked Minority Oversampling in Boosting (RAMOBoost), which is a RAMO technique based on the idea of adaptive synthetic data generation in an ensemble learning system. Briefly, RAMOBoost adaptively ranks minority class instances at each learning iteration according to a sampling probability distribution that is based on the underlying data distribution, and can adaptively shift the decision boundary toward difficult-to-learn minority and majority class instances by using a hypothesis assessment procedure. Simulation analysis on 19 real-world datasets assessed over various metrics-including overall accuracy, precision, recall, F-measure, G-mean, and receiver operation characteristic analysis-is used to illustrate the effectiveness of this method.
Keywords :
data mining; iterative methods; learning (artificial intelligence); probability; RAMO technique; RAMOBoost; academia; adaptive synthetic data generation; boosting; data distribution; data mining; ensemble learning system; imbalanced data analysis; learning iteration; probability distribution; ranked minority oversampling in boosting; simulation analysis; Boosting; Clustering algorithms; Distribution functions; Euclidean distance; Nearest neighbor searches; Training; Adaptive boosting; data mining; ensemble learning; imbalanced data; Algorithms; Artificial Intelligence; Computer Simulation; Data Interpretation, Statistical; Data Mining; Databases, Factual; Models, Statistical;
Journal_Title :
Neural Networks, IEEE Transactions on
DOI :
10.1109/TNN.2010.2066988