Title :
A Novel Hybrid system for Large-Scale Chinese Text Classification Problem
Author :
Gao, Zhong ; Lu, Guanming ; Gu, Daquan
Author_Institution :
Coll. of Telecommun. & Inf. Eng., Nanjing Univ. of Posts & Telecommun., Nanjing, China
Abstract :
Most of the Chinese text classification systems are all based on the technology of bag of words (BW) which is a valid probability tool for text representation and can provide a better semantic architecture. But the weakness in classification accuracy is still unconquerable. Support vector machine (SVM) has become a popular classification tool and can be applied in the scheme, but the main disadvantages of SVM algorithms are their large memory requirement and computation time to deal with very large datasets. In this paper, we propose a hybrid system based on BW and a novel cascade SVM with feedback that can be splitting the problem into smaller subsets and training a network to assign samples of different subsets. The proposed parallel training algorithm on large-scale classification problems where multiple SVM classifiers are applied speeds up the process of training SVM and increase the classification accuracy.
Keywords :
classification; feedback; natural language processing; probability; support vector machines; text analysis; word processing; bag of words; feedback; large-scale Chinese text classification; probability tool; semantic architecture; support vector machine; text representation; Computer architecture; Educational institutions; Feedback; Large-scale systems; Machine learning; Natural language processing; Quadratic programming; Support vector machine classification; Support vector machines; Text categorization;
Conference_Titel :
Frontier of Computer Science and Technology, 2008. FCST '08. Japan-China Joint Workshop on
Conference_Location :
Nagasahi
Print_ISBN :
978-1-4244-3418-3
DOI :
10.1109/FCST.2008.29