DocumentCode :
1220845
Title :
Indexing high-dimensional data for efficient in-memory similarity search
Author :
Cui, Bin ; Beng Chin Coi ; Su, Jianwen ; Tan, Kian-Lee
Author_Institution :
Dept. of Comput. Sci., Nat. Univ. of Singapore, Singapore
Volume :
17
Issue :
3
fYear :
2005
fDate :
3/1/2005 12:00:00 AM
Firstpage :
339
Lastpage :
353
Abstract :
In main memory systems, the L2 cache typically employs cache line sizes of 32-128 bytes. These values are relatively small compared to high-dimensional data, e.g., >32D. The consequence is that existing techniques (on low-dimensional data) that minimize cache misses are no longer effective. We present a novel index structure, called Δ-tree, to speed up the high-dimensional query in main memory environment. The Δ-tree is a multilevel structure where each level represents the data space at different dimensionalities: the number of dimensions increases toward the leaf level. The remaining dimensions are obtained using principal component analysis. Each level of the tree serves to prune the search space more efficiently as the lower dimensions can reduce the distance computation and better exploit the small cache line size. Additionally, the top-down clustering scheme can capture the feature of the data set and, hence, reduces the search space. We also propose an extension, called Δ+-tree, that globally clusters the data space and then partitions clusters into small regions. The Δ+-tree can further reduce the computational cost and cache misses. We conducted extensive experiments to evaluate the proposed structures against existing techniques on different kinds of data sets. Our results show that the Δ+-tree is superior in most cases.
Keywords :
cache storage; database indexing; principal component analysis; query processing; tree data structures; tree searching; very large databases; Δ-tree index structure; high-dimensional data indexing; in-memory similarity search; main memory systems; principal component analysis; top-down clustering scheme; Computational efficiency; Computer Society; Indexes; Indexing; Information retrieval; Multimedia databases; Principal component analysis; Query processing; Random access memory; Spatial databases;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2005.46
Filename :
1388245
Link To Document :
بازگشت