Title :
Co-ClusterD: A Distributed Framework for Data Co-Clustering with Sequential Updates
Author :
Sen Su ; Xiang Cheng ; Lixin Gao ; Jiangtao Yin
Abstract :
Co-clustering is a powerful data mining tool for co-occurrence and dyadic data. As data sets become increasingly large, the scalability of co-clustering becomes more and more important. In this paper, we propose two approaches to parallelize co-clustering with sequential updates in a distributed environment. Based on these two approaches, we present a new distributed framework, Co-ClusterD, that supports efficient implementations of co-clustering algorithms with sequential updates. We design and implement Co-ClusterD, and show its efficiency through two co-clustering algorithms: fast nonnegative matrix tri-factorization (FNMTF) and information theoretic co-clustering (ITCC). We evaluate our framework on both a local cluster of machines and the Amazon EC2 cloud. Our evaluation shows that co-clustering algorithms implemented in Co-ClusterD can achieve better results and run faster than their traditional concurrent counterparts.
Keywords :
data mining; distributed processing; information theory; matrix decomposition; pattern clustering; Amazon EC2 cloud; Co-ClusterD; FNMTF; ITCC; coclustering parallelization approach; data coclustering algorithm; distributed environment; distributed framework; dyadic data; fast nonnegative matrix trifactorization; information theoretic coclustering; sequential updates; Algorithm design and analysis; Clustering algorithms; Convergence; Distributed databases; Linear programming; Scalability; Synchronization; Cloud Computing; Co-Clustering; Concurrent Updates; Distributed Framework; Sequential Updates;
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
DOI :
10.1109/ICDM.2013.76