Title :
Experiments with a hierarchical text categorizer
Author :
Tikk, Domonkos ; Biró, György ; Yang, Jae Dong
Author_Institution :
Dept. of Telecom. & Media Inf., Budapest Univ. of Technol. & Econ., Hungary
Abstract :
HITEC is a hierarchical text categorizer tool that is based on UFEX (universal feature extractor) algorithm. This paper presents experiments on the effectiveness of HITEC on several natural languages (English, German) and with various kinds of text corpora. The obtained results show that HITEC outperforms its known competitors on the investigated corpora, and its performance is independent from the processed languages. The time and storage requirement of HITEC is considerable, therefore it can be run on an average PC.
Keywords :
feature extraction; natural languages; text analysis; English language; German language; HITEC; hierarchical text categorizer tool; natural languages; text corpora; universal feature extractor; Databases; Feature extraction; Informatics; Internet; Natural languages; Power generation economics; Taxonomy; Telecommunications; Text categorization; XML;
Conference_Titel :
Fuzzy Systems, 2004. Proceedings. 2004 IEEE International Conference on
Print_ISBN :
0-7803-8353-2
DOI :
10.1109/FUZZY.2004.1375582