Title :
Bootstrap Sampling Based Data Cleaning and Maximum Entropy SVMs for Large Datasets
Author :
Senzhang Wang ; Zhoujun Li ; Xiaoming Zhang
Author_Institution :
State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
Abstract :
Support Vector Machines (SVMs) is a popular machine learning algorithm based on Statistical Learning Theory (SLT). However, traditional solutions suffer from O(n2) time complexity. In this paper, a novel two-stage informative pattern abstraction algorithm is proposed. The first stage of the algorithm is data cleaning based on bootstrap sampling. A bundle of weak SVM classifiers are trained based on the sampled small datasets. Training data correctly classified by all the weak classifiers are cleaned. In the second stage, to further improve performance of final classifier and reduce training time, two novel informative pattern extraction algorithms based on entropy maximization SVMs are proposed. Empirical studies show our approach is effective in reducing size of training datasets and the computational cost, outperforming the state-of-the-art SVM training algorithms PEGASOS, RSVM and LIBLINEAR SVM with comparable classification accuracy.
Keywords :
computational complexity; data handling; learning (artificial intelligence); pattern classification; sampling methods; support vector machines; LIBLINEAR SVM; O(n2) time complexity; PEGASOS; RSVM; SLT; SVM classifiers; SVM training algorithms; bootstrap sampling based data cleaning; classification accuracy; entropy maximization SVM; informative pattern extraction algorithms; large datasets; machine learning algorithm; maximum entropy SVM; statistical learning theory; support vector machines; two-stage informative pattern abstraction algorithm; Accuracy; Cleaning; Data mining; Information entropy; Support vector machines; Training; Training data; SVMs; bootstrap sampling; entropy maximization;
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2012 IEEE 24th International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4799-0227-9
DOI :
10.1109/ICTAI.2012.164