DocumentCode :
3310602
Title :
A new partitioning based algorithm for document clustering
Author :
Zonghu Wang ; Zhijing Liu ; Donghui Chen ; Kai Tang
Author_Institution :
Sch. of Comput. Sci. & Technol., Xidian Univ., Xi´an, China
Volume :
3
fYear :
2011
fDate :
26-28 July 2011
Firstpage :
1741
Lastpage :
1745
Abstract :
Document clustering is one of the key problems in text mining and information retrieval area. It groups text documents in a way that maximizes the similarity within clusters and minimizes the similarity between different clusters. Most partitioning based algorithms are sensitive to the initial centroids, the clustering result greatly depends on the initial centroids. This paper first uses unsupervised feature selection method to reduce the dimension of document feature space and then proposes a novel partitioning based algorithm which select initial cluster centriods in the process of clustering by the size and density of cluster in the datasets. The experiments on several text datasets show that the proposed approach effectively improves the quality of clustering.
Keywords :
feature extraction; information retrieval; text analysis; clustering quality; document clustering; document feature space; information retrieval; partitioning based algorithm; text datasets; text mining; unsupervised feature selection method; Algorithm design and analysis; Clustering algorithms; Complexity theory; Databases; Educational institutions; Heuristic algorithms; Partitioning algorithms; centroid; clustering; feature; mention; partitioning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-180-9
Type :
conf
DOI :
10.1109/FSKD.2011.6019857
Filename :
6019857
Link To Document :
بازگشت