Title :
Cogset: A Unified Engine for Reliable Storage and Parallel Processing
Author :
Valvåg, Steffen Viken ; Johansen, Dag
Author_Institution :
Dept. of Comput. Sci., Univ. of Tromso, Tromso, Norway
Abstract :
MapReduce has become a popular paradigm for parallel data processing, both for ad-hoc schema-less processing using a simple functional interface, and as a building block for higher-level abstractions. Much subsequent work has layered additional functionality on top of MapReduce or similar infrastructures, building powerful software stacks for distributed applications. In this paper, we present Cogset, the result of re-thinking the original MapReduce architecture that sits at the bottom of the stack. We observe that the traditional loose coupling between the distributed file system and the MapReduce processing engine leads to poor data locality for many applications. Accordingly, Cogset offers both reliable storage and parallel data processing, fusing the two components into a single system that ensures good data locality. We also take a new approach to data shuffling, relying on highly efficient static routing, and devise new mechanisms for fault tolerance, load balancing and ensuring consistency. We evaluate Cogset using a suite of benchmark applications, comparing it to Hadoop with very favorable results. For example, on a 12-node cluster, an inverted index that takes 80 minutes to build using Hadoop can be constructed using Cogset in less than 35 minutes.
Keywords :
parallel architectures; resource allocation; software fault tolerance; Cogset unified engine; MapReduce architecture; ad-hoc schema-less processing; data locality; data shuffling; fault tolerance; higher-level abstraction; highly efficient static routing; load balancing; parallel data processing; reliable storage; simple functional interface; software stack; Application software; Buildings; Computer architecture; Data processing; Engines; Fault tolerance; File systems; Load management; Parallel processing; Routing; File Systems; MapReduce; Parallel Processing; Reliable Storage;
Conference_Titel :
Network and Parallel Computing, 2009. NPC '09. Sixth IFIP International Conference on
Conference_Location :
Gold Coast, QLD
Print_ISBN :
978-1-4244-4990-3
Electronic_ISBN :
978-0-7695-3837-2
DOI :
10.1109/NPC.2009.23