DocumentCode :
783597
Title :
KBA: kernel boundary alignment considering imbalanced data distribution
Author :
Wu, Gang ; Chang, Edward Y.
Author_Institution :
Dept. of Electr. & Comput. Eng., California Univ., Santa Barbara, CA, USA
Volume :
17
Issue :
6
fYear :
2005
fDate :
6/1/2005 12:00:00 AM
Firstpage :
786
Lastpage :
795
Abstract :
An imbalanced training data set can pose serious problems for many real-world data mining tasks that employ SVMs to conduct supervised learning. In this paper, we propose a kernel-boundary-alignment algorithm, which considers THE training data imbalance as prior information to augment SVMs to improve class-prediction accuracy. Using a simple example, we first show that SVMs can suffer from high incidences of false negatives when the training instances of the target class are heavily outnumbered by the training instances of a nontarget class. The remedy we propose is to adjust the class boundary by modifying the kernel matrix, according to the imbalanced data distribution. Through theoretical analysis backed by empirical study, we show that our kernel-boundary-alignment algorithm works effectively on several data sets.
Keywords :
data mining; learning (artificial intelligence); matrix algebra; pattern classification; statistical analysis; support vector machines; SVM; class-prediction accuracy; imbalanced data distribution; imbalanced training data set; kernel boundary alignment algorithm; real-world data mining task; supervised classification; supervised learning; support vector machines; Algorithm design and analysis; Bayesian methods; Data mining; Diseases; Kernel; Supervised learning; Support vector machine classification; Support vector machines; Surveillance; Training data; Index Terms- Imbalanced-data training; supervised classification.; support vector machines;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2005.95
Filename :
1423979
Link To Document :
بازگشت