Title :
The importance of stop word removal on recall values in text categorization
Author :
Silva, Catarina ; Ribeiro, Bemardete
Author_Institution :
Dept. de Engenharia Inf., Coimbra Univ., Portugal
Abstract :
Given a data set and a learning task such as classification, there are two prime motives for executing some kind of data set reduction. On one hand there is the possible algorithm performance improvement. On the other hand the decrease in the overall size of the data set can bring advantages in storage space used and time spent computing. Our purpose is to determine the importance of several basic reduction techniques on Support Vector Machines, by comparing their relative performance improvement when applied on the standard REUTERS-21578 benchmark.
Keywords :
classification; data reduction; indexing; information retrieval; support vector machines; text editing; REUTERS-21578 benchmark; algorithm performance improvement; data set reduction; learning task; stop word removal; storage space; support vector machines; text categorization; text classification; Humans; Information retrieval; Internet; Large-scale systems; Support vector machine classification; Support vector machines; Taxonomy; Text categorization; Text mining; Text processing;
Conference_Titel :
Neural Networks, 2003. Proceedings of the International Joint Conference on
Print_ISBN :
0-7803-7898-9
DOI :
10.1109/IJCNN.2003.1223656