DocumentCode :
3546845
Title :
Chinese text feature extraction method based on bigram
Author :
Yang Bin ; Peng Cunlin ; Liu Dan
Author_Institution :
Res. Inst. Electron. Sci. & Technol., Univ. of Electron. Sci. & Technol. of China, Chengdu, China
Volume :
2
fYear :
2013
fDate :
15-17 Nov. 2013
Firstpage :
342
Lastpage :
346
Abstract :
One of the most important issues in Chinese text categorization is Chinese Word Segmentation. Conventional Chinese Word Segmentation modules usually consume too much time. In this paper, a novel method is proposed where bigram is applied to quantify the text and improved information gain algorithms are used to collect appropriate feature during text categorization. Experiments show that the bigram feature collected characterise the categorization fairly well. Compared to using word feature, the method proposed consumes much less time while maintaining a certain required categorization accuracy. Hence, this method can be applied to time-sensitive systems.
Keywords :
feature extraction; feature selection; image segmentation; probability; text analysis; text detection; Chinese text categorization; Chinese text feature extraction method; Chinese word segmentation; bigram feature; categorization accuracy; information gain algorithms; time sensitive systems; word feature; Accuracy; Classification algorithms; Mutual information; Probability; Support vector machine classification; Text categorization; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communications, Circuits and Systems (ICCCAS), 2013 International Conference on
Conference_Location :
Chengdu
Print_ISBN :
978-1-4799-3050-0
Type :
conf
DOI :
10.1109/ICCCAS.2013.6765352
Filename :
6765352
Link To Document :
بازگشت