DocumentCode
262382
Title
Using Accumulo for Graph Twiddling
Author
Webb, Darren
Author_Institution
Cyber Characterisation & Shaping Branch, Defence Sci. & Technol. Organ., Adelaide, SA, Australia
fYear
2014
fDate
3-5 Dec. 2014
Firstpage
285
Lastpage
292
Abstract
There is great interest in using commodity cluster computers to process large graphs, but there is limited research in this area. Graph Twiddling in MapReduce [5] describes a pattern for graph processing using a series of sort and shuffle MapReduce operations. Graphs are iteratively read, processed, and written to the distributed file system. Storing the graph in a Big table-inspired database can provide performance benefits. However, some databases offer alternative processing opportunities that can assist processing. Accumulo for example implements queries as a scan using distributed iterator trees. An iterator can filter, produce a metric, or aggregate rows. MapReduce used in combination with Accumulo uses multiple scanners, each processing a range of an Accumulo table. Instead, we propose to use iterators in place of MapReduce. We show that Accumulo iterators can be used as an effective replacement for MapReduce, while providing substantial performance gains. We describe our approach demonstrating where performance improvements are achieved, and discuss drawbacks of the approach.
Keywords
Big Data; data handling; graph theory; parallel processing; tree data structures; Accumulo; MapReduce; big table-inspired database; commodity cluster computers; distributed iterator trees; graph twiddling; Clustering algorithms; Computers; Distributed databases; File systems; Google; Servers; Apache Accumulo; MapReduce algorithm; distributed processing;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on
Conference_Location
Sydney, NSW
Type
conf
DOI
10.1109/BDCloud.2014.133
Filename
7034806
Link To Document