Title :
Research on Chinese Text Automatic Categorization Based on VSM
Author :
Tong Xiao-Jun ; Cui Ming-Gen ; Song Guo-Long
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Weihai
Abstract :
Automatic text classifying is an import application of the information processing technology. This paper introduces the key techniques of Chinese text categorization such as text preprocessing, feature selection, feature representation, training and classifying algorithm, especially analyses the current most important several feature selection methods with emphasis. A Chinese text classifier based on KNN algorithm was developed. The system can preferably implement Chinese automatic text categorization and has a higher quality. We also use this classifier to compare several feature selection methods. In the end, we utilize the experiment results to prove the importance role of feature selection in text categorization.
Keywords :
text analysis; word processing; Chinese text automatic categorization; KNN algorithm; VSM; automatic text classification; feature representation; feature selection; information processing technology; text preprocessing; training algorithm; Application software; Computer science; Educational institutions; Frequency; Information processing; Information science; Natural languages; Space technology; Text categorization; Vocabulary;
Conference_Titel :
Wireless Communications, Networking and Mobile Computing, 2007. WiCom 2007. International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-1311-9
DOI :
10.1109/WICOM.2007.955