DocumentCode
3322092
Title
Compact Similarity Joins
Author
Bryan, Brent ; Eberhardt, Frederick ; Faloutsos, Christos
Author_Institution
Dept. of Machine Learning, Carnegie Mellon Univ., Pittsburgh, PA
fYear
2008
fDate
7-12 April 2008
Firstpage
346
Lastpage
355
Abstract
Similarity joins have attracted significant interest, with applications in geographical information systems, astronomy, marketing analyzes, and anomaly detection. However, all the past algorithms, although highly fine-tuned, suffer an output explosion if the query range is even moderately large relative to the local data density. Under such circumstances, the response time and the search effort are both almost quadratic in the database size, which is often prohibitive. We solve this problem by providing two algorithms that find a compact representation of the similarity join result, while retaining all the information in the standard join. Our algorithms have the following characteristics: (a) they are at least as fast as the standard similarity join algorithm, and typically much faster, (b) they generate significantly smaller output, (c) they provably lose no information, (d) they scale well to large data sets, and (e) they can be applied to any of the standard tree data structures. Experiments on real and realistic point-sets show that our algorithms are up to several orders of magnitude faster.
Keywords
data structures; database management systems; geographic information systems; trees (mathematics); compact similarity join algorithms; geographical information systems; local data density; tree data structures; Astronomy; Character generation; Databases; Delay; Explosions; Information analysis; Information management; Information systems; Management information systems; Tree data structures;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location
Cancun
Print_ISBN
978-1-4244-1836-7
Electronic_ISBN
978-1-4244-1837-4
Type
conf
DOI
10.1109/ICDE.2008.4497443
Filename
4497443
Link To Document