Title :
Accelerating aggregation using intra-cycle parallelism
Author :
Ziqiang Feng ; Lo, Eric
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong, China
Abstract :
Modern CPUs have word width of 64 bits but real data values are usually represented using bits fewer than a CPU word. This underutilization of CPU at register level has motivated the recent development of bit-parallel algorithms that carry out data processing operations (e.g, filter scan) on CPU words packed with data values (e.g, 8 data values are packed into one 64-bit word). Bit-parallel algorithms fully unleash the intra-cycle parallelism of modern CPUs and they are especially attractive to main-memory column stores whose goal is to process data at the speed of the “bare metal”. Main-memory column stores generally focus on analytical queries, where aggregation is a common operation. Current bit-parallel algorithms, however, have not covered aggregation yet. In this paper, we present a suite of bit-parallel algorithms to accelerate all standard aggregation operations: SUM, MIN, MAX, AVG, MEDIAN, COUNT. The algorithms are designed to fully leverage the intra-cycle parallelism in CPU cores when aggregating words of packed values. Experimental evaluation shows that our bit-parallel aggregation algorithms exhibit significant performance benefits compared with non-bit-parallel methods.
Keywords :
parallel algorithms; query processing; AVG operation; COUNT operation; CPU word; MAX operation; MEDIAN operation; MIN operation; SUM operation; analytical queries; bare metal; bit-parallel algorithms; data processing operations; data values; intracycle parallelism; main-memory column; register level CPU underutilization; Acceleration; Algorithm design and analysis; Central Processing Unit; Layout; Parallel processing; Registers; Standards;
Conference_Titel :
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location :
Seoul
DOI :
10.1109/ICDE.2015.7113292