• DocumentCode
    1070590
  • Title

    Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis

  • Author

    Wong, Andrew K.C. ; Li, Gary C L

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Waterloo, Waterloo, ON
  • Volume
    20
  • Issue
    7
  • fYear
    2008
  • fDate
    7/1/2008 12:00:00 AM
  • Firstpage
    911
  • Lastpage
    923
  • Abstract
    In data mining and knowledge discovery, pattern discovery extracts previously unknown regularities in the data and is a useful tool for categorical data analysis. However, the number of patterns discovered is often overwhelming. It is difficult and time-consuming to 1) interpret the discovered patterns and 2) use them to further analyze the data set. To overcome these problems, this paper proposes a new method that clusters patterns and their associated data simultaneously. When patterns are clustered, the data containing the patterns are also clustered; and the relation between patterns and data is made explicit. Such an explicit relation allows the user on the one hand to further analyze each pattern cluster via its associated data cluster, and on the other hand to interpret why a data cluster is formed via its corresponding pattern cluster. Since the effectiveness of clustering mainly depends on the distance measure, several distance measures between patterns and their associated data are proposed. Their relationships to the existing common ones are discussed. Once pattern clusters and their associated data clusters are obtained, each of them can be further analyzed individually. To evaluate the effectiveness of the proposed approach, experimental results on synthetic and real data are reported.
  • Keywords
    data analysis; data mining; pattern clustering; associated data; categorical data analysis; data clustering; distance measure; pattern cluster analysis; pattern discovery; Clustering; Data mining; Similarity measures; and association rules; classification;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.38
  • Filename
    4453823