DocumentCode
1941679
Title
A Hierarchical VQSVM for Imbalanced Data Sets
Author
Yu, Ting ; Jan, Tony ; Simoff, Simeon ; Debenham, John
Author_Institution
Univ. of Technol., Sydney
fYear
2007
fDate
12-17 Aug. 2007
Firstpage
518
Lastpage
523
Abstract
First, a hierarchical modelling method, VQSVM, is introduced, and some remarks are discussed. Secondly the proposed VQSVM is applied to a nonstandard learning environment, imbalanced data sets. In cases of extremely imbalanced dataset with high dimensions, standard machine learning techniques tend to be overwhelmed by the large classes. The hierarchical VQSVM contains a set of local models i.e. codevectors produced by the vector quantization and a global model, i.e. support vector machine, to rebalance datasets without significant information loss. Some issues, e.g. distortion and support vectors, have been discussed to address the trade-off between the information loss and undersampling rate. Experiments compare VQSVM with random resampling techniques on some imbalanced datasets with varied imbalance ratios, and results show that the performance of VQSVM is superior or equivalent to random resampling techniques, especially in case of extremely imbalanced large datasets.
Keywords
data analysis; learning (artificial intelligence); support vector machines; vector quantisation; VQSVM hierarchical modelling method; codevectors; imbalanced data sets; machine learning technique; nonstandard learning environment; vector quantization support vector machine; Collaborative work; Data compression; Electromagnetic interference; Filters; International collaboration; Machine learning; Neural networks; Parametric statistics; Support vector machines; Vector quantization;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2007. IJCNN 2007. International Joint Conference on
Conference_Location
Orlando, FL
ISSN
1098-7576
Print_ISBN
978-1-4244-1379-9
Electronic_ISBN
1098-7576
Type
conf
DOI
10.1109/IJCNN.2007.4371010
Filename
4371010
Link To Document