DocumentCode :
1748812
Title :
Kernel-based text categorisation
Author :
Jalam, Radwan ; Teytaud, Olivier
Author_Institution :
ERIC, Univ. Lumiere Lyon II, Mendes, France
Volume :
3
fYear :
2001
fDate :
2001
Firstpage :
1891
Abstract :
Presents some techniques in text categorization. New algorithms, in particular a new support vector machine kernel for text categorization, are developed and compared to usual techniques. This kernel leads to a more natural space for elaborating separations than the euclidian space of frequencies or even in verse frequencies, as the distance in this space is the most usual pseudo-distance between distributions. We give an application to the recognition of the author of a text, and put into relief that our kernel could be used for any classification of distributions. We experimentally discuss the efficiency of our algorithms, depending on the precision of the estimation of frequencies, and the possibility of building statistical bounds on the error. All our experiments are made on underconstrained problems
Keywords :
learning automata; pattern classification; radial basis function networks; text analysis; distributions classification; kernel-based text categorisation; pseudo-distance; separations; statistical bounds; support vector machine kernel; underconstrained problems; Dictionaries; Frequency measurement; Kernel; Noise robustness; Sequences; Smoothing methods; Support vector machines; Text categorization; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2001. Proceedings. IJCNN '01. International Joint Conference on
Conference_Location :
Washington, DC
ISSN :
1098-7576
Print_ISBN :
0-7803-7044-9
Type :
conf
DOI :
10.1109/IJCNN.2001.938452
Filename :
938452
Link To Document :
بازگشت