DocumentCode :
3747503
Title :
A cell-MST-based method for big dataset clustering on limited memory computers
Author :
Duong Van Hieu;Phayung Meesad
Author_Institution :
Faculty of Information Technology, King Mongkut´s University of Technology North Bangkok, Bangkok 10800, Thailand
fYear :
2015
Firstpage :
632
Lastpage :
637
Abstract :
This paper presents a new clustering algorithm, called Cell-MST-Based Method that is a combination of a Cell-based method and Minimum Spanning Tree based (MST-based) methods. The algorithm is dedicated for Big Datasets on a limited memory computer, especially for thin big datasets which have a small number of attributes but a very large number of instances. Firstly, a Cell-based method converts a big dataset to a small grid of cells in such a way that the required memory to store an edge-weighted graph created from the grid which is less than the available memory of a computer. Then MST-based methods obtain an optimal threshold, estimate the number of clusters and determine the initial centroids. The proposed Cell-MST-based methods can reduce more than 99% of the required memory of the previous similarity-based and MST-based cluster number estimation methods. Moreover, this new Cell-MST-based method also outperforms the quantization error modeling method in terms of executing time and estimated accurate level.
Keywords :
"Memory management","Clustering algorithms","Computers","Estimation","Partitioning algorithms","Information technology","Cost function"
Publisher :
ieee
Conference_Titel :
Information Technology and Electrical Engineering (ICITEE), 2015 7th International Conference on
Type :
conf
DOI :
10.1109/ICITEED.2015.7409023
Filename :
7409023
Link To Document :
بازگشت