DocumentCode
263623
Title
Adaptive Centroid-Based Clustering Algorithm for Text Document Data
Author
Ximing Li ; JiHong Ouyang ; Xiaotang Zhou ; Bo Fu
Author_Institution
CCST, Jilin Univ., Changchun, China
fYear
2014
fDate
13-15 July 2014
Firstpage
63
Lastpage
68
Abstract
Document clustering is a significantly popular research, which aims to partition a corpus into many subgroups of homogeneous documents. Traditional clustering approaches catholically lack of considerations of word weights with clusters. To address this problem, we propose an Adaptive Centroid-based Clustering (ACC) algorithm. As a successful supervised centroid-based classifier, Class-Feature-Centroid (CFC) algorithm takes relationships among words into account. ACC attempts to employ this discriminative CFC vector to drive the clustering procedure. Since clustering is unsupervised, ACC begins with hundreds of small clusters for acceptable CFC vectors, and then iteratively regroups clusters of documents until convergence. As ACC is self-organized, it can determine the number of clusters adaptively. The experimental results validate that ACC achieves competitive performance with the state-of-art clustering approaches.
Keywords
document handling; pattern classification; pattern clustering; vectors; ACC algorithm; CFC algorithm; CFC vector; adaptive centroid-based clustering algorithm; class-feature-centroid algorithm; corpus partition; document clustering; homogeneous documents; supervised centroid-based classifier; text document data; Algorithm design and analysis; Clustering algorithms; Entropy; Frequency modulation; Measurement; Partitioning algorithms; Vectors; Class-Feature-Centroid; adaptively; document clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Architectures, Algorithms and Programming (PAAP), 2014 Sixth International Symposium on
Conference_Location
Beijing
ISSN
2168-3034
Print_ISBN
978-1-4799-3844-5
Type
conf
DOI
10.1109/PAAP.2014.13
Filename
6916438
Link To Document