DocumentCode :
419636
Title :
Noisy text categorization
Author :
Vinciarelli, Alessandro
Author_Institution :
Dalle Molle Inst. for Perceptual Artificial Intelligence, Switzerland
Volume :
2
fYear :
2004
fDate :
23-26 Aug. 2004
Firstpage :
554
Abstract :
This work presents a system for the categorization of noisy texts. Noisy means any text obtained through an extraction process (affected by errors) from media different than digital texts. We show that, even with an average word error rate of around 50%, the categorization performance loss with respect to the clean version of the same documents is negligible.
Keywords :
text analysis; word processing; average word error rate; categorization performance loss; digital texts; noisy text categorization; Data mining; Databases; Error analysis; Handwriting recognition; Information retrieval; Performance loss; Speech recognition; Support vector machines; Text categorization; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on
ISSN :
1051-4651
Print_ISBN :
0-7695-2128-2
Type :
conf
DOI :
10.1109/ICPR.2004.1334303
Filename :
1334303
Link To Document :
بازگشت