Title :
A new partitioning based algorithm for document clustering
Author :
Zonghu Wang ; Zhijing Liu ; Donghui Chen ; Kai Tang
Author_Institution :
Sch. of Comput. Sci. & Technol., Xidian Univ., Xi´an, China
Abstract :
Document clustering is one of the key problems in text mining and information retrieval area. It groups text documents in a way that maximizes the similarity within clusters and minimizes the similarity between different clusters. Most partitioning based algorithms are sensitive to the initial centroids, the clustering result greatly depends on the initial centroids. This paper first uses unsupervised feature selection method to reduce the dimension of document feature space and then proposes a novel partitioning based algorithm which select initial cluster centriods in the process of clustering by the size and density of cluster in the datasets. The experiments on several text datasets show that the proposed approach effectively improves the quality of clustering.
Keywords :
feature extraction; information retrieval; text analysis; clustering quality; document clustering; document feature space; information retrieval; partitioning based algorithm; text datasets; text mining; unsupervised feature selection method; Algorithm design and analysis; Clustering algorithms; Complexity theory; Databases; Educational institutions; Heuristic algorithms; Partitioning algorithms; centroid; clustering; feature; mention; partitioning;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-180-9
DOI :
10.1109/FSKD.2011.6019857