Abstract :
A distance permutation index supports fast proximity searching in a high-dimensional metric space. Given some fixed reference sites, for each point in a database the index stores a permutation naming the closest site, the second-closest, and so on. We examine how many distinct permutations can occur as a function of the number of sites and the size of the space. We give theoretical results for tree metrics and vector spaces with L1, L2, and Linfin metrics, improving on the previous best known storage space in the vector case. We also give experimental results and commentary on the number of distance permutations that actually occur in a variety of vector, string, and document spaces.
Keywords :
database management systems; tree searching; counting distance permutations; fast proximity searching; high-dimensional metric space; tree metrics; vector spaces; Application software; Audio databases; Computer science; Costs; Data structures; Distance measurement; Extraterrestrial measurements; Genetics; Image databases; Indexes; aesa; distance permutation; knn; oriented matroid; proximity preserving order; proximity search;