Title :
A linguistic feature based text clustering method
Author :
Shi, Kansheng ; Li, Lemin ; He, Jie ; Liu, Haitao ; Zhang, Naitong ; Song, Wentao
Author_Institution :
Shanghai Jiaotong Univ., Shanghai, China
Abstract :
The traditional K-means algorithm is sensitive to the initial point, easy to fall into local optimum. In order to avoid this kind of flaw, an improved K-means text clustering method WIKTCM is proposed. The new method creates an innovative initial centers selection method and accommodates the contribution of characteristics of different parts of speech to the text. In addition, the impact of outliers is considered. Experimental results show that the new method has better clustering results.
Keywords :
computational linguistics; pattern clustering; text analysis; WIKTCM; linguistic feature; text clustering; traditional K-means algorithm; Algorithm design and analysis; Clustering algorithms; Clustering methods; Computers; Educational institutions; Mathematical model; Speech; K-means; Sample average similarity; Text clustering; VSM;
Conference_Titel :
Cloud Computing and Intelligence Systems (CCIS), 2011 IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-61284-203-5
DOI :
10.1109/CCIS.2011.6045042