Title :
Exploiting Coarse-Grained Parallelism in B+ Tree Searches on an APU
Author :
Daga, Mayank ; Nutter, Mark
Author_Institution :
AMD Res., Adv. Micro Devices, Inc., Bellevue, WA, USA
Abstract :
B+ tree structured index searches are one of the fundamental database operations and hence, accelerating them is essential. GPUs provide a compelling mix of performance per watt and performance per dollar, and thus are an attractive platform for accelerating B+ tree searches. However, tree search on discrete GPUs presents significant challenges for acceleration due to (i) the irregular representation in memory and (ii) the requirement to copy the tree to the GPU memory over the PCIe bus. In this paper, we present the acceleration of B+ tree searches on a fused CPU+GPU processor (an accelerated processing unit or APU). We counter the aforementioned issues by reorganizing the B+ tree in memory and utilizing the novel heterogeneous system architecture, which eliminates (i) the need to copy the tree to the GPU and (ii) the limitation on the size of the tree that can be accelerated. Our approach exploits the coarse-grained parallelism in tree search, wherein we execute multiple searches in parallel to optimize for the SIMD width without modifying the inherent B+ tree data structure. Our results illustrate that the APU implementation can perform up to 70M1 queries per second and is 4.9x faster in the best case and 2.5x faster on average than a hand-tuned, SSE-optimized, six-core CPU implementation, for varying orders of the B+ tree with 4M keys. We also present an analysis of the effect of caches on performance, and of the efficacy of the APU to eliminate data-copies.
Keywords :
data structures; graphics processing units; parallel processing; tree searching; APU; B+ tree data structure; B+ tree searches; B+ tree structured index searches; CPU+GPU processor; GPU memory; PCIe bus; SIMD; accelerated processing unit; attractive platform; coarse grained parallelism; database operations; exploiting coarse grained parallelism; irregular representation; AMD; B+ Tree; APU; Coarse-Grained Parallelism; Performance Acceleration;
Conference_Titel :
High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion:
Conference_Location :
Salt Lake City, UT
Print_ISBN :
978-1-4673-6218-4
DOI :
10.1109/SC.Companion.2012.40