DocumentCode :
1169215
Title :
MetricMap: an embedding technique for processing distance-based queries in metric spaces
Author :
Wang, Xiong ; Xiong Wang ; Shasha, Dennis ; Zhang, Kaizhong
Author_Institution :
Dept. of Comput. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
Volume :
35
Issue :
5
fYear :
2005
Firstpage :
973
Lastpage :
987
Abstract :
In this paper, we present an embedding technique, called MetricMap, which is capable of estimating distances in a pseudometric space. Given a database of objects and a distance function for the objects, which is a pseudometric, we map the objects to vectors in a pseudo-Euclidean space with a reasonably low dimension while preserving the distance between two objects approximately. Such an embedding technique can be used as an approximate oracle to process a broad class of distance-based queries. It is also adaptable to data mining applications such as data clustering and classification. We present the theory underlying MetricMap and conduct experiments to compare MetricMap with other methods including MVP-tree and M-tree in processing the distance-based queries. Experimental results on both protein and RNA data show the good performance and the superiority of MetricMap over the other methods.
Keywords :
data mining; proteins; query processing; tree searching; M-tree; MVP-tree; MetricMap; RNA data; bioinformatics; data mining; distance estimation; embedding technique; nearest neighbor; protein data; pseudo-Euclidean space; pseudometric space; query processing; similarity search; Computer science; Data mining; Data structures; Databases; Delta modulation; Extraterrestrial measurements; Information retrieval; Nearest neighbor searches; Proteins; RNA; Bioinformatics; data mining; embedding method; metric space; nearest neighbors; similarity search; Algorithms; Artificial Intelligence; Cluster Analysis; Data Interpretation, Statistical; Databases, Factual; Information Storage and Retrieval; Numerical Analysis, Computer-Assisted; Pattern Recognition, Automated; Sequence Analysis; Signal Processing, Computer-Assisted;
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4419
Type :
jour
DOI :
10.1109/TSMCB.2005.848489
Filename :
1510772
Link To Document :
بازگشت