• DocumentCode
    249378
  • Title

    Adaptive Indexing for Distributed Array Processing

  • Author

    Yifeng Geng ; Xiaomeng Huang ; Guangwen Yang

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
  • fYear
    2014
  • fDate
    June 27 2014-July 2 2014
  • Firstpage
    331
  • Lastpage
    338
  • Abstract
    Scientists are facing the data deluge in the scientific explorations. Big data are collected by the scientific instruments and experiments. The data are usually multidimensional arrays and stored in many files. Distributed computing techniques such as MapReduce make exploring the large datasets practical. The index is a well-known measure to shorten the query processing duration. Most of existing indexing methods need a full load of the raw data to build the index. In this paper, we proposed a distributed adaptive indexing method for the distributed array-oriented query processing. Our method does not require a full scan of the array data. For each subarray accessed by a subtask, we divide the array into multiple logical blocks with a proper block size. The normal processing routine is executed when handling a query. Meanwhile, the index for the blocks accessed by the query is built at a low cost. So the whole index grows along with processing queries. This incremental manner exploits the accessed data of historical queries and eliminates the long load procedure. The experiments show that our adaptive indexing implemented over Hadoop and Hive is effective for accelerating array-oriented query processing without introducing much overhead.
  • Keywords
    Big Data; database indexing; distributed processing; query processing; scientific information systems; Hadoop; Hive; MapReduce; array-oriented query processing; big data; data deluge; distributed adaptive indexing method; distributed array processing; distributed array-oriented query processing; distributed computing techniques; long load procedure; multidimensional arrays; query handling; scientific experiments; scientific explorations; scientific instruments; Acceleration; Arrays; Big data; Indexing; Parallel processing; Query processing; MapReduce; big data; indexing; multidimensional array;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2014 IEEE International Congress on
  • Conference_Location
    Anchorage, AK
  • Print_ISBN
    978-1-4799-5056-0
  • Type

    conf

  • DOI
    10.1109/BigData.Congress.2014.55
  • Filename
    6906798