DocumentCode :
2248937
Title :
Experiments with a hierarchical text categorizer
Author :
Tikk, Domonkos ; Biró, György ; Yang, Jae Dong
Author_Institution :
Dept. of Telecom. & Media Inf., Budapest Univ. of Technol. & Econ., Hungary
Volume :
2
fYear :
2004
fDate :
25-29 July 2004
Firstpage :
1191
Abstract :
HITEC is a hierarchical text categorizer tool that is based on UFEX (universal feature extractor) algorithm. This paper presents experiments on the effectiveness of HITEC on several natural languages (English, German) and with various kinds of text corpora. The obtained results show that HITEC outperforms its known competitors on the investigated corpora, and its performance is independent from the processed languages. The time and storage requirement of HITEC is considerable, therefore it can be run on an average PC.
Keywords :
feature extraction; natural languages; text analysis; English language; German language; HITEC; hierarchical text categorizer tool; natural languages; text corpora; universal feature extractor; Databases; Feature extraction; Informatics; Internet; Natural languages; Power generation economics; Taxonomy; Telecommunications; Text categorization; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on
ISSN :
1098-7584
Print_ISBN :
0-7803-8353-2
Type :
conf
DOI :
10.1109/FUZZY.2004.1375582
Filename :
1375582
Link To Document :
بازگشت