• DocumentCode
    2710472
  • Title

    A Hierarchical Algorithm for Clustering Uncertain Data via an Information-Theoretic Approach

  • Author

    Gullo, Francesco ; Ponti, Giovanni ; Tagarelli, Andrea ; Greco, Sergio

  • Author_Institution
    DEIS, Univ. of Calabria, Rende
  • fYear
    2008
  • fDate
    15-19 Dec. 2008
  • Firstpage
    821
  • Lastpage
    826
  • Abstract
    In recent years there has been a growing interest in clustering uncertain data. In contrast to traditional, "sharp" data representation models, uncertain data objects can be represented in terms of an uncertainty region over which a probability density function (pdf) is defined. In this context, the focus has been mainly on partitional and density-based approaches, whereas hierarchical clustering schemes have drawn less attention. We propose a centroid-linkage-based agglomerative hierarchical algorithm for clustering uncertain objects, named U-AHC. The cluster merging criterion is based on an information-theoretic measure to compute the distance between cluster prototypes. These prototypes are represented as mixture densities that summarize the pdfs of all the uncertain objects in the clusters. Experiments have shown that our method outperforms state-of-the-art clustering algorithms from an accuracy viewpoint while achieving reasonably good efficiency.
  • Keywords
    data mining; information theory; pattern clustering; probability; uncertainty handling; centroid-linkage-based agglomerative hierarchical algorithm; cluster prototype; data representation model; density-based approach; knowledge discovery; probability density function; uncertain data clustering; Biomedical measurements; Clustering algorithms; Data analysis; Data mining; Density measurement; Merging; Partitioning algorithms; Probability density function; Prototypes; Uncertainty; hierarchical clustering; information-theoretic distance measures; uncertain data management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2008. ICDM '08. Eighth IEEE International Conference on
  • Conference_Location
    Pisa
  • ISSN
    1550-4786
  • Print_ISBN
    978-0-7695-3502-9
  • Type

    conf

  • DOI
    10.1109/ICDM.2008.115
  • Filename
    4781185