• DocumentCode
    263623
  • Title

    Adaptive Centroid-Based Clustering Algorithm for Text Document Data

  • Author

    Ximing Li ; JiHong Ouyang ; Xiaotang Zhou ; Bo Fu

  • Author_Institution
    CCST, Jilin Univ., Changchun, China
  • fYear
    2014
  • fDate
    13-15 July 2014
  • Firstpage
    63
  • Lastpage
    68
  • Abstract
    Document clustering is a significantly popular research, which aims to partition a corpus into many subgroups of homogeneous documents. Traditional clustering approaches catholically lack of considerations of word weights with clusters. To address this problem, we propose an Adaptive Centroid-based Clustering (ACC) algorithm. As a successful supervised centroid-based classifier, Class-Feature-Centroid (CFC) algorithm takes relationships among words into account. ACC attempts to employ this discriminative CFC vector to drive the clustering procedure. Since clustering is unsupervised, ACC begins with hundreds of small clusters for acceptable CFC vectors, and then iteratively regroups clusters of documents until convergence. As ACC is self-organized, it can determine the number of clusters adaptively. The experimental results validate that ACC achieves competitive performance with the state-of-art clustering approaches.
  • Keywords
    document handling; pattern classification; pattern clustering; vectors; ACC algorithm; CFC algorithm; CFC vector; adaptive centroid-based clustering algorithm; class-feature-centroid algorithm; corpus partition; document clustering; homogeneous documents; supervised centroid-based classifier; text document data; Algorithm design and analysis; Clustering algorithms; Entropy; Frequency modulation; Measurement; Partitioning algorithms; Vectors; Class-Feature-Centroid; adaptively; document clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures, Algorithms and Programming (PAAP), 2014 Sixth International Symposium on
  • Conference_Location
    Beijing
  • ISSN
    2168-3034
  • Print_ISBN
    978-1-4799-3844-5
  • Type

    conf

  • DOI
    10.1109/PAAP.2014.13
  • Filename
    6916438