Title of article :
SVOIS: Support Vector Oriented Instance Selection for text classification
Author/Authors :
Chih-Fong Tsai، نويسنده , , Che-Wei Chang، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2013
Pages :
14
From page :
1070
To page :
1083
Abstract :
Automatic text classification is usually based on models constructed through learning from training examples. However, as the size of text document repositories grows rapidly, the storage requirements and computational cost of model learning is becoming ever higher. Instance selection is one solution to overcoming this limitation. The aim is to reduce the amount of data by filtering out noisy data from a given training dataset. A number of instance selection algorithms have been proposed in the literature, such as ENN, IB3, ICF, and DROP3. However, all of these methods have been developed for the k-nearest neighbor (k-NN) classifier. In addition, their performance has not been examined over the text classification domain where the dimensionality of the dataset is usually very high. The support vector machines (SVM) are core text classification techniques. In this study, a novel instance selection method, called Support Vector Oriented Instance Selection (SVOIS), is proposed. First of all, a regression plane in the original feature space is identified by utilizing a threshold distance between the given training instances and their class centers. Then, another threshold distance, between the identified data (forming the regression plane) and the regression plane, is used to decide on the support vectors for the selected instances. The experimental results based on the TechTC-100 dataset show the superior performance of SVOIS over other state-of-the-art algorithms. In particular, using SVOIS to select text documents allows the k-NN and SVM classifiers perform better than without instance selection.
Keywords :
Support Vector Machines , Text classification , Machine Learning , data reduction , Instance selection
Journal title :
Information Systems
Serial Year :
2013
Journal title :
Information Systems
Record number :
1230345
Link To Document :
بازگشت