Title :
Text Clustering Based on Key Phrases
Author :
Wang, Ai ; Li, YaoDong ; Wang, Wei
Author_Institution :
Key Lab. of Complex Syst. & Intell. Sci., Chinese Acad. of Sci., Beijing, China
Abstract :
Text clustering is a hot and essential topic in data mining and information retrieval. This paper proposed a KP-FCM clustering method, which used the key phrases as text features and applied the Fuzzy c-means (FCM) as clustering algorithm. In this method, key phrases were extracted by an algorithm based on suffix array. Experimental results on two standard text clustering benchmark corpuses, OHSUMED (English) and the SOGOU corpus (Chinese) showed that this KP-FCM algorithm outperformed STC-10, Lingo in terms of overall precision, overall recall and overall F-Measure. This indicated that the approach is very effective both in English and Chinese environments. And what´s more, since this method was based on key phrases, it could get a readable label of each cluster, which would make the users browse online web search results or volume files more conveniently.
Keywords :
Internet; fuzzy set theory; pattern clustering; text analysis; KP-FCM clustering method; OHSUMED; SOGOU corpus; data mining; fuzzy c-means; information retrieval; online Web search; suffix array; text clustering; text features; Clustering algorithms; Clustering methods; Data engineering; Data mining; Information retrieval; Information science; Intelligent systems; Laboratories; Natural languages; Web search;
Conference_Titel :
Information Science and Engineering (ICISE), 2009 1st International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4909-5
DOI :
10.1109/ICISE.2009.1163