Title :
Enumerating Maximal Bicliques from a Large Graph Using MapReduce
Author :
Mukherjee, Arko Provo ; Tirthapura, Srikanta
Author_Institution :
Iowa State Univ., Ames, IA, USA
fDate :
June 27 2014-July 2 2014
Abstract :
We consider the enumeration of maximal bipartite cliques (bicliques) from a large graph, a task central to many practical data mining problems in social network analysis and bioinformatics. We present novel parallel algorithms for the MapReduce platform, and an experimental evaluation using Hadoop MapReduce. Our algorithm is based on clustering the input graph into smaller sized subgraphs, followed by processing different subgraphs in parallel. Our algorithm uses two ideas that enable it to scale to large graphs: (1) the redundancy in work between different subgraph explorations is minimized through a careful pruning of the search space, and (2) the load on different reducers is balanced through the use of an appropriate total order among the vertices. Our evaluation shows that the algorithm scales to large graphs with millions of edges and tens of millions of maximal bicliques. To our knowledge, this is the first work on maximal biclique enumeration for graphs of this scale.
Keywords :
data analysis; graph theory; pattern clustering; Hadoop MapReduce; clustering; data mining problems; maximal bicliques enumeration; maximal bipartite cliques; parallel algorithms; search space pruning; Clustering algorithms; Complexity theory; Data mining; Load management; Parallel algorithms; Runtime; Social network services; Biclique; Graph Mining; Hadoop; MapReduce; Maximal Biclique Enumeration; Parallel Algorithm;
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
DOI :
10.1109/BigData.Congress.2014.105