Title :
Out-of-core assessment of clustering tendency for large data sets
Author :
Pakhira, Malay K.
Author_Institution :
Kalyani Gov. Eng. Coll., Kalyani, India
Abstract :
Determining the number of clusters present in a data set automatically is a very important problem. Conventional clustering techniques assume a certain number of clusters, and then try to find out the possible cluster structure associated to the above number. For very large and complex data sets it is not easy to guess this number of clusters. There exists validity based clustering techniques, which measure a certain cluster validity measure of a certain clustering result by varying the number of clusters. After doing this for a broad range of possible number of clusters, this method selects the number for which the validity measure is optimum. This method is, however, awkward and may not always be applicable for very large data sets. Recently an interesting visual technique for determining clustering tendency has been developed. This new technique is called VAT in abbreviation. The original VAT and its different versions are found to determine the number of clusters, before actually applying any clustering algorithm, very satisfactorily. In this paper, we have proposed an out-of-core VAT algorithm (o-VAT) for very large data sets.
Keywords :
pattern clustering; very large databases; cluster validity measure; clustering algorithm; clustering tendency; complex data set; o-VAT algorithm; out-of-core VAT algorithm; out-of-core assessment; very large data set; Clustering algorithms; Data engineering; Data mining; Displays; Educational institutions; Government; Machine learning; Machine learning algorithms; Pixel; Size control; Clustering; Number of clusters; VAT algorithm; Visual assessment;
Conference_Titel :
Advance Computing Conference (IACC), 2010 IEEE 2nd International
Conference_Location :
Patiala
Print_ISBN :
978-1-4244-4790-9
Electronic_ISBN :
978-1-4244-4791-6
DOI :
10.1109/IADCC.2010.5423044