DocumentCode :
420953
Title :
Design and implementation of a multi-label Chinese text categorization system
Author :
Chen, Junli ; Zhou, Xuezhong ; Wu, Zhaohui
Author_Institution :
Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
Volume :
3
fYear :
2004
fDate :
15-19 June 2004
Firstpage :
1885
Abstract :
Based on the Chinese character representation and the boosting algorithm, a multi-label Chinese text categorization system is demonstrated. This system has been successfully tested on two multi-labeled datasets, namely traditional Chinese medicine (TCM) dataset- TCM-MED and Reuters21578. Experiments have also been carried out to compare the performance of the boosting algorithm with two other traditional algorithms on the two datasets mentioned above. The results indicate that the boosting algorithm outperforms the other two algorithms in Chinese text categorization.
Keywords :
character recognition; feature extraction; learning (artificial intelligence); text analysis; Chinese character representation; boosting algorithm; multilabel Chinese text categorization system; multilabeled datasets; traditional Chinese medicine dataset; Boosting; Computer science; Educational institutions; Medical tests; Postal services; System testing; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on
Print_ISBN :
0-7803-8273-0
Type :
conf
DOI :
10.1109/WCICA.2004.1341906
Filename :
1341906
Link To Document :
بازگشت