Title :
Kernel-based text categorisation
Author :
Jalam, Radwan ; Teytaud, Olivier
Author_Institution :
ERIC, Univ. Lumiere Lyon II, Mendes, France
Abstract :
Presents some techniques in text categorization. New algorithms, in particular a new support vector machine kernel for text categorization, are developed and compared to usual techniques. This kernel leads to a more natural space for elaborating separations than the euclidian space of frequencies or even in verse frequencies, as the distance in this space is the most usual pseudo-distance between distributions. We give an application to the recognition of the author of a text, and put into relief that our kernel could be used for any classification of distributions. We experimentally discuss the efficiency of our algorithms, depending on the precision of the estimation of frequencies, and the possibility of building statistical bounds on the error. All our experiments are made on underconstrained problems
Keywords :
learning automata; pattern classification; radial basis function networks; text analysis; distributions classification; kernel-based text categorisation; pseudo-distance; separations; statistical bounds; support vector machine kernel; underconstrained problems; Dictionaries; Frequency measurement; Kernel; Noise robustness; Sequences; Smoothing methods; Support vector machines; Text categorization; Text recognition;
Conference_Titel :
Neural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on
Conference_Location :
Washington, DC
Print_ISBN :
0-7803-7044-9
DOI :
10.1109/IJCNN.2001.938452