Title :
Using Accumulo for Graph Twiddling
Author_Institution :
Cyber Characterisation & Shaping Branch, Defence Sci. & Technol. Organ., Adelaide, SA, Australia
Abstract :
There is great interest in using commodity cluster computers to process large graphs, but there is limited research in this area. Graph Twiddling in MapReduce [5] describes a pattern for graph processing using a series of sort and shuffle MapReduce operations. Graphs are iteratively read, processed, and written to the distributed file system. Storing the graph in a Big table-inspired database can provide performance benefits. However, some databases offer alternative processing opportunities that can assist processing. Accumulo for example implements queries as a scan using distributed iterator trees. An iterator can filter, produce a metric, or aggregate rows. MapReduce used in combination with Accumulo uses multiple scanners, each processing a range of an Accumulo table. Instead, we propose to use iterators in place of MapReduce. We show that Accumulo iterators can be used as an effective replacement for MapReduce, while providing substantial performance gains. We describe our approach demonstrating where performance improvements are achieved, and discuss drawbacks of the approach.
Keywords :
Big Data; data handling; graph theory; parallel processing; tree data structures; Accumulo; MapReduce; big table-inspired database; commodity cluster computers; distributed iterator trees; graph twiddling; Clustering algorithms; Computers; Distributed databases; File systems; Google; Servers; Apache Accumulo; MapReduce algorithm; distributed processing;
Conference_Titel :
Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on
Conference_Location :
Sydney, NSW
DOI :
10.1109/BDCloud.2014.133