• DocumentCode
    2773230
  • Title

    A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical Data

  • Author

    Xiong, Tengke ; Wang, Shengrui ; Mayers, André ; Monga, Ernest

  • Author_Institution
    Dept. Comput. Sci., Univ. of Sherbrooke, Sherbrooke, QC, Canada
  • fYear
    2009
  • fDate
    6-9 Dec. 2009
  • Firstpage
    1058
  • Lastpage
    1063
  • Abstract
    Clustering categorical data faces two challenges, one is lacking of inherent similarity measure, and the other is that the clusters are prone to being embedded in different subspace. In this paper, we propose the first divisive hierarchical clustering algorithm for categorical data. The algorithm, which is based on multiple correspondence analysis (MCA), is systematic, efficient and effective. In our algorithm, MCA plays an important role in analyzing the data globally. The proposed algorithm has five merits. First, our algorithm yields a dendrogram representing nested groupings of patterns and similarity levels at different granularities. Second, it is parameter-free, fully automatic and, most importantly, requires no assumption regarding the number of clusters. Third, it is independent of the order in which the data are processed. Forth, it is scalable to large data sets; and finally, using the novel data representation and Chi-square distance measures makes our algorithm capable of seamlessly discovering the clusters embedded in the subspaces. Experiments on both synthetic and real data demonstrate the superior performance of our algorithm.
  • Keywords
    data structures; group theory; pattern clustering; Chi-square distance measures; categorical data clustering; data representation; dendrogram; divisive hierarchical algorithm; multiple correspondence analysis; nested groupings; Algorithm design and analysis; Clustering algorithms; Computational complexity; Computer science; Data analysis; Data mining; Mathematics; Categorical Data; Clustering; Divisive Hierarchical; MCA;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on
  • Conference_Location
    Miami, FL
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-5242-2
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2009.118
  • Filename
    5360356