• DocumentCode
    1081302
  • Title

    Automatically Determining the Number of Clusters in Unlabeled Data Sets

  • Author

    Wang, Liang ; Leckie, Christopher ; Ramamohanarao, Kotagiri ; Bezdek, James

  • Author_Institution
    Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC
  • Volume
    21
  • Issue
    3
  • fYear
    2009
  • fDate
    3/1/2009 12:00:00 AM
  • Firstpage
    335
  • Lastpage
    350
  • Abstract
    Clustering is a popular tool for exploratory data analysis. One of the major problems in cluster analysis is the determination of the number of clusters in unlabeled data, which is a basic input for most clustering algorithms. In this paper we investigate a new method called DBE (dark block extraction) for automatically estimating the number of clusters in unlabeled data sets, which is based on an existing algorithm for visual assessment of cluster tendency (VAT) of a data set, using several common image and signal processing techniques. Basic steps include: 1) generating a VAT image of an input dissimilarity matrix; 2) performing image segmentation on the VAT image to obtain a binary image, followed by directional morphological filtering; 3) applying a distance transform to the filtered binary image and projecting the pixel values onto the main diagonal axis of the image to form a projection signal; 4) smoothing the projection signal, computing its first-order derivative, and then detecting major peaks and valleys in the resulting signal to decide the number of clusters. Our new DBE method is nearly "automatic", depending on just one easy-to-set parameter. Several numerical and real-world examples are presented to illustrate the effectiveness of DBE.
  • Keywords
    data analysis; document image processing; image segmentation; matrix algebra; pattern clustering; cluster analysis; cluster tendency visual assessment; dark block extraction; data analysis; directional morphological filtering; image segmentation; input dissimilarity matrix; unlabeled data sets; Algorithm design and analysis; Clustering algorithms; Data analysis; Data mining; Filtering; Image generation; Image segmentation; Pixel; Signal generators; Signal processing algorithms; Cluster Tendency; Clustering; Data and knowledge visualization; Database Applications; Database Management; Information Technology;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.158
  • Filename
    4759628