Title :
Design and implementation of a multi-label Chinese text categorization system
Author :
Chen, Junli ; Zhou, Xuezhong ; Wu, Zhaohui
Author_Institution :
Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
Abstract :
Based on the Chinese character representation and the boosting algorithm, a multi-label Chinese text categorization system is demonstrated. This system has been successfully tested on two multi-labeled datasets, namely traditional Chinese medicine (TCM) dataset- TCM-MED and Reuters21578. Experiments have also been carried out to compare the performance of the boosting algorithm with two other traditional algorithms on the two datasets mentioned above. The results indicate that the boosting algorithm outperforms the other two algorithms in Chinese text categorization.
Keywords :
character recognition; feature extraction; learning (artificial intelligence); text analysis; Chinese character representation; boosting algorithm; multilabel Chinese text categorization system; multilabeled datasets; traditional Chinese medicine dataset; Boosting; Computer science; Educational institutions; Medical tests; Postal services; System testing; Text categorization;
Conference_Titel :
Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on
Print_ISBN :
0-7803-8273-0
DOI :
10.1109/WCICA.2004.1341906