Title :
Efficient Training Data Reduction for SVM based Handwritten Digits Recognition
Author :
Javed, I. ; Ayyaz, M.N. ; Mehmood, W.
Author_Institution :
Univ. of Eng. & Technol., Lahore
Abstract :
Support vector machine (SVM) are binary classifiers that make any two classes linearly separable by finding a maximum-margin hyper-plane between the data samples of the two classes in a given feature space. Once the discrimination function of this hyper-plane has been found during the training stage, any unknown sample can be classified by checking the sign of this discrimination function for the unknown sample. It is well understood in SVM theory that the equation of SVM discrimination function is largely determined by data points close to the decision boundary. These data points close to the decision boundary are called as support vectors (SV). SVM training process for large data sets is often a time consuming process. Hence reducing the original data to contain only the SVs is a useful goal for speeding up the training process. This reduction of training data should not affect the accuracy of SVM classifier. In this paper, we propose an efficient training data reduction algorithm (Peer-SV) for SVM classifiers. The algorithm is based on the observation that the desired support vectors are those data points which are of opposite classes and whose diametric sphere does not contain any other class instance of the two classes. We have found these SVs in an efficient way i.e. computing the SVs between the peer classes only and removing the farthest points earlier to retain the border points. The algorithm has been tested on handwritten digits data sets. The results obtained on the total data and on the reduced data shows the accuracy of the adopted approach.
Keywords :
data reduction; decision theory; feature extraction; handwritten character recognition; learning (artificial intelligence); pattern classification; support vector machines; Peer-SV algorithm; SVM-based handwritten digits recognition; binary classifiers; decision boundary; discrimination function; feature space; maximum-margin hyper-plane; support vector machine; training data reduction; Computer science; Data engineering; Equations; Handwriting recognition; Nearest neighbor searches; Quadratic programming; Space technology; Support vector machine classification; Support vector machines; Training data;
Conference_Titel :
Electrical Engineering, 2007. ICEE '07. International Conference on
Conference_Location :
Lahore
Print_ISBN :
1-4244-0893-8
Electronic_ISBN :
1-4244-0893-8
DOI :
10.1109/ICEE.2007.4287360