• DocumentCode
    3125664
  • Title

    A New Markov Model for Clustering Categorical Sequences

  • Author

    Xiong, Tengke ; Wang, Shengrui ; Jiang, Qingshan ; Huang, Joshua Zhexue

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sherbrooke, Sherbrooke, QC, Canada
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    854
  • Lastpage
    863
  • Abstract
    Clustering categorical sequences remains an open and challenging task due to the lack of an inherently meaningful measure of pair wise similarity between sequences. Model initialization is an unsolved problem in model-based clustering algorithms for categorical sequences. In this paper, we propose a simple and effective Markov model to approximate the conditional probability distribution (CPD) model, and use it to design a novel two-tier Markov model to represent a sequence cluster. Furthermore, we design a novel divisive hierarchical algorithm for clustering categorical sequences based on the two-tier Markov model. The experimental results on the data sets from three different domains demonstrate the promising performance of our models and clustering algorithm.
  • Keywords
    Markov processes; pattern clustering; sequences; statistical distributions; categorical sequence clustering; conditional probability distribution model; divisive hierarchical algorithm; model based clustering algorithm; pairwise similarity; sequence cluster; two-tier Markov model; Algorithm design and analysis; Clustering algorithms; Data models; Hidden Markov models; Markov processes; Numerical models; Vectors; Markov model; categorical sequence; clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.13
  • Filename
    6137290