Title :
Research on K-means Text Clustering Algorithm Based on Semantic
Author :
Liu, Yufang ; Xiao, Shibin ; Lv, Xueqiang ; Shi, Shuicai
Author_Institution :
Chinese Inf. Process. Res. Center, Beijing Inf. Sci. & Technol. Univ., Beijing, China
Abstract :
Through research on K-means algorithm of text clustering and semantic-based vector space model, a semantic-based K-means text clustering model is proposed to solve the problem on high-dimensional and sparse characteristics of text data set. The model reduces the semantic loss of the text data and improves the quality of text clustering. Experiments prove that semantic-based text clustering increases by more 6 percent than non-semantic-based one in the final evaluation of the F1 index value.
Keywords :
pattern clustering; text analysis; F1 index value; k-means clustering; semantic-based vector space model; text clustering algorithm; Clustering algorithms; Filtering; Industrial engineering; Information processing; Information science; Information technology; Mathematical model; Optical noise; Partitioning algorithms; Space technology; HowNet; K-means algorithm; Term Contribution; semantic similarity; text vector;
Conference_Titel :
Computing, Control and Industrial Engineering (CCIE), 2010 International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-4026-9
DOI :
10.1109/CCIE.2010.39