Title of article :
Finding the Number of Clusters in Unlabeled Datasets using Extended Dark Block Extraction
Author/Authors :
Srinivasulu Asadi، نويسنده , , Ch D V Subba Rao، نويسنده , , Cheemalapati Saikrishna، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2010
Abstract :
Clustering analysis is the problem of partitioning a set of objects O = {o1... on} into c self-similar subsets based on available data. In general, clustering of unlabeled data poses three major problems: 1) assessing cluster tendency, i.e., how many clusters to seek? 2) Partitioning the data into c meaningful groups, and 3) validating the c clusters that are discovered. We address the first problem, i.e., determining the number of clusters c prior to clustering. Many clustering algorithms require number of clusters as an input parameter, so the quality of the clusters mainly depends on this value. Most methods are post clustering measures of cluster validity i.e., they attempt to choose the best partition from a set of alternative partitions. In contrast, tendency assessment attempts to estimate c before clustering occurs. Here, we represent the structure of the unlabeled data sets as a Reordered Dissimilarity Image (RDI), where pair wise dissimilarity information about a data set including ʹnʹ objects is represented as nxn image. RDI is generated using VAT (Visual Assessment of Cluster tendency), RDI highlights potential clusters as a set of "dark blocks" along the diagonal of the image. So, number of clusters can be easily estimated using the number of dark blocks across the diagonal. We develop a new method called "Extended Dark Block Extraction (EDBE) for counting the number of clusters formed along the diagonal of the RDI. EDBE method combines several image and signal processing techniques.
Keywords :
Clustering , Cluster tendency , Reordered Dissimilarity Image , VAT , C-Means clustering
Journal title :
International Journal of Computer Applications
Journal title :
International Journal of Computer Applications