• DocumentCode
    1961959
  • Title

    Independent quantization: an index compression technique for high-dimensional data spaces

  • Author

    Berchtold, Stefan ; Böhm, Christian ; Jagadish, H.V. ; Kriegel, Hans-Peter ; Sander, Jorg

  • Author_Institution
    STB Software Technol. Beratung GmbH, Augsburg, Germany
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    577
  • Lastpage
    588
  • Abstract
    Two major approaches have been proposed to efficiently process queries in databases: speeding up the search by using index structures, and speeding up the search by operating on a compressed database, such as a signature file. Both approaches have their limitations: indexing techniques are inefficient in extreme configurations, such as high-dimensional spaces, where even a simple scan may be cheaper than an index-based search. Compression techniques are not very efficient in all other situations. We propose to combine both techniques to search for nearest neighbors in a high-dimensional space. For this purpose, we develop a compressed index, called the IQ-tree, with a three-level structure: the first level is a regular (flat) directory consisting of minimum bounding boxes, the second level contains data points in a compressed representation, and the third level contains the actual data. We overcome several engineering challenges in constructing an effective index structure of this type. The most significant of these is to decide how much to compress at the second level. Too much compression will lead to many needless expensive accesses to the third level. Too little compression will increase both the storage and the access cost for the first two levels. We develop a cost model and an optimization algorithm based on this cost model that permits an independent determination of the degree of compression for each second level page to minimize expected query cost. In an experimental evaluation, we demonstrate that the IQ-tree shows a performance that is the “best of both worlds” for a wide range of data distributions and dimensionalities
  • Keywords
    data compression; database indexing; database theory; optimisation; query processing; software performance evaluation; tree data structures; IQ-tree; cost model; data distribution; database compression; database indexing; directory; experimental evaluation; high-dimensional data spaces; index compression technique; minimum bounding boxes; nearest neighbor search; optimization; query processing; searching; signature file; Cost function; Ear; Encoding; Indexing; Needles; Quantization; Radio access networks; US Department of Transportation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2000. Proceedings. 16th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-0506-6
  • Type

    conf

  • DOI
    10.1109/ICDE.2000.839456
  • Filename
    839456