Title :
ParaSAIL: Bitmap Indexing Using Many Cores
Author :
Zhong, Tao ; Doshi, Kshitij A. ; Gang Deng
Abstract :
In large IoT systems, many millions of records reach back-end servers every second from devices that generate data automatically and continuously. The receiving servers have to process and store these high volumes of data and support interactive ad-hoc and real-time queries that require data to be searched quickly. Servers typically create bitmap indexes to reduce the quantity of data that needs to be examined to find needles in haystacks. Building a bitmap index, particularly when data continues to arrive at high rate, incurs appreciable CPU consumption. Memory efficiency also becomes a concern as bitmap indexes can take up space in proportion with the numbers of logical fields per record -- rather than the actual numbers of fields populated with data. This paper describes a bitmap index creation method called ParaSAIL (for Parallel Sets of Aligned Index Lines). ParaSAIL focuses on maximizing the speedup from parallel execution by avoiding write-write and write-read cache line conflicts among CPUs. It is thus well positioned to benefit from systems with high numbers of cores -- from mainstream multicore processors such as Intel® Xeon® E5 and E7 machines, to many-core engines such as Intel® Xeon PhiTM coprocessors. ParaSAIL also obtains space efficiencies by creating a clustered bitmap index for an attribute value or range that only occurs intermittently in the data. An evaluation using data from an oceanographic dataset shows that ParaSAIL running on Xeon Phi coprocessor system indexes 473 million records per second and produces dense bitmaps that don´t require further compression.
Keywords :
Big Data; coprocessors; database indexing; multiprocessing systems; parallel processing; query processing; Big Data; CPU consumption; E5 machines; E7 machines; Intel; IoT systems; ParaSAIL; Xeon Phi coprocessors; back-end servers; bitmap index creation method; bitmap indexing; clustered bitmap index; data process; data store; interactive ad-hoc queries; many-core engines; memory efficiency; multicore processors; parallel execution; parallel sets of aligned index lines; query processing; real-time queries; Compounds; Coprocessors; Data structures; Indexing; Multicore processing; Big Data; BitMap Indexing; Compaction; Compression; Internet of Things; Manycore; Multicore; Query processing;
Conference_Titel :
Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2014 International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4799-6235-8
DOI :
10.1109/CyberC.2014.49