Title :
Adaptive Indexing for Distributed Array Processing
Author :
Yifeng Geng ; Xiaomeng Huang ; Guangwen Yang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fDate :
June 27 2014-July 2 2014
Abstract :
Scientists are facing the data deluge in the scientific explorations. Big data are collected by the scientific instruments and experiments. The data are usually multidimensional arrays and stored in many files. Distributed computing techniques such as MapReduce make exploring the large datasets practical. The index is a well-known measure to shorten the query processing duration. Most of existing indexing methods need a full load of the raw data to build the index. In this paper, we proposed a distributed adaptive indexing method for the distributed array-oriented query processing. Our method does not require a full scan of the array data. For each subarray accessed by a subtask, we divide the array into multiple logical blocks with a proper block size. The normal processing routine is executed when handling a query. Meanwhile, the index for the blocks accessed by the query is built at a low cost. So the whole index grows along with processing queries. This incremental manner exploits the accessed data of historical queries and eliminates the long load procedure. The experiments show that our adaptive indexing implemented over Hadoop and Hive is effective for accelerating array-oriented query processing without introducing much overhead.
Keywords :
Big Data; database indexing; distributed processing; query processing; scientific information systems; Hadoop; Hive; MapReduce; array-oriented query processing; big data; data deluge; distributed adaptive indexing method; distributed array processing; distributed array-oriented query processing; distributed computing techniques; long load procedure; multidimensional arrays; query handling; scientific experiments; scientific explorations; scientific instruments; Acceleration; Arrays; Big data; Indexing; Parallel processing; Query processing; MapReduce; big data; indexing; multidimensional array;
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
DOI :
10.1109/BigData.Congress.2014.55