Title :
Adaptive Centroid-Based Clustering Algorithm for Text Document Data
Author :
Ximing Li ; JiHong Ouyang ; Xiaotang Zhou ; Bo Fu
Author_Institution :
CCST, Jilin Univ., Changchun, China
Abstract :
Document clustering is a significantly popular research, which aims to partition a corpus into many subgroups of homogeneous documents. Traditional clustering approaches catholically lack of considerations of word weights with clusters. To address this problem, we propose an Adaptive Centroid-based Clustering (ACC) algorithm. As a successful supervised centroid-based classifier, Class-Feature-Centroid (CFC) algorithm takes relationships among words into account. ACC attempts to employ this discriminative CFC vector to drive the clustering procedure. Since clustering is unsupervised, ACC begins with hundreds of small clusters for acceptable CFC vectors, and then iteratively regroups clusters of documents until convergence. As ACC is self-organized, it can determine the number of clusters adaptively. The experimental results validate that ACC achieves competitive performance with the state-of-art clustering approaches.
Keywords :
document handling; pattern classification; pattern clustering; vectors; ACC algorithm; CFC algorithm; CFC vector; adaptive centroid-based clustering algorithm; class-feature-centroid algorithm; corpus partition; document clustering; homogeneous documents; supervised centroid-based classifier; text document data; Algorithm design and analysis; Clustering algorithms; Entropy; Frequency modulation; Measurement; Partitioning algorithms; Vectors; Class-Feature-Centroid; adaptively; document clustering;
Conference_Titel :
Parallel Architectures, Algorithms and Programming (PAAP), 2014 Sixth International Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-3844-5
DOI :
10.1109/PAAP.2014.13