DocumentCode
249378
Title
Adaptive Indexing for Distributed Array Processing
Author
Yifeng Geng ; Xiaomeng Huang ; Guangwen Yang
Author_Institution
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear
2014
fDate
June 27 2014-July 2 2014
Firstpage
331
Lastpage
338
Abstract
Scientists are facing the data deluge in the scientific explorations. Big data are collected by the scientific instruments and experiments. The data are usually multidimensional arrays and stored in many files. Distributed computing techniques such as MapReduce make exploring the large datasets practical. The index is a well-known measure to shorten the query processing duration. Most of existing indexing methods need a full load of the raw data to build the index. In this paper, we proposed a distributed adaptive indexing method for the distributed array-oriented query processing. Our method does not require a full scan of the array data. For each subarray accessed by a subtask, we divide the array into multiple logical blocks with a proper block size. The normal processing routine is executed when handling a query. Meanwhile, the index for the blocks accessed by the query is built at a low cost. So the whole index grows along with processing queries. This incremental manner exploits the accessed data of historical queries and eliminates the long load procedure. The experiments show that our adaptive indexing implemented over Hadoop and Hive is effective for accelerating array-oriented query processing without introducing much overhead.
Keywords
Big Data; database indexing; distributed processing; query processing; scientific information systems; Hadoop; Hive; MapReduce; array-oriented query processing; big data; data deluge; distributed adaptive indexing method; distributed array processing; distributed array-oriented query processing; distributed computing techniques; long load procedure; multidimensional arrays; query handling; scientific experiments; scientific explorations; scientific instruments; Acceleration; Arrays; Big data; Indexing; Parallel processing; Query processing; MapReduce; big data; indexing; multidimensional array;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location
Anchorage, AK
Print_ISBN
978-1-4799-5056-0
Type
conf
DOI
10.1109/BigData.Congress.2014.55
Filename
6906798
Link To Document