DocumentCode :
2357974
Title :
Design and Implementation of Chinese Text Clustering System
Author :
Tan, Ying ; Huang, Lan ; Qi, Hong ; Zhai, Yandong
Author_Institution :
Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun, China
fYear :
2009
fDate :
25-27 Aug. 2009
Firstpage :
1136
Lastpage :
1140
Abstract :
Clustering technology is the core technology of text mining. Through text clustering, a large number of text messages can be divided into several meaningful classes or clusters. According to the features of Chinese documents, this paper designs and implements the Chinese Text Clustering System to perform automatic clustering of Chinese documents. Firstly, this system will carry out Chinese word automatic segmentation for the input Chinese document sets by using reverse maximum matching method. Secondly, further text preprocessing is performed. Finally the K-means clustering algorithm is used to obtain the clustering results. The prototype system can also be used in clustering Chinese Web pages to search for user´s interest model by search engines, which will improve the efficiency of searching the target content.
Keywords :
Internet; data mining; pattern clustering; search engines; text analysis; Chinese Web pages clustering; Chinese text clustering system; Chinese word automatic segmentation; K-means clustering algorithm; reverse maximum matching; search engines; text mining; text preprocessing; Clustering algorithms; Computer science; Data mining; Educational institutions; Electronic mail; Particle separators; Prototypes; Search engines; Text mining; Web pages; Chinese text clustering; Chinese word segmentation; K-means algorithm; reverse maximum matching; text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
INC, IMS and IDC, 2009. NCM '09. Fifth International Joint Conference on
Conference_Location :
Seoul
Print_ISBN :
978-1-4244-5209-5
Electronic_ISBN :
978-0-7695-3769-6
Type :
conf
DOI :
10.1109/NCM.2009.234
Filename :
5331328
Link To Document :
بازگشت