• DocumentCode
    3322092
  • Title

    Compact Similarity Joins

  • Author

    Bryan, Brent ; Eberhardt, Frederick ; Faloutsos, Christos

  • Author_Institution
    Dept. of Machine Learning, Carnegie Mellon Univ., Pittsburgh, PA
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    346
  • Lastpage
    355
  • Abstract
    Similarity joins have attracted significant interest, with applications in geographical information systems, astronomy, marketing analyzes, and anomaly detection. However, all the past algorithms, although highly fine-tuned, suffer an output explosion if the query range is even moderately large relative to the local data density. Under such circumstances, the response time and the search effort are both almost quadratic in the database size, which is often prohibitive. We solve this problem by providing two algorithms that find a compact representation of the similarity join result, while retaining all the information in the standard join. Our algorithms have the following characteristics: (a) they are at least as fast as the standard similarity join algorithm, and typically much faster, (b) they generate significantly smaller output, (c) they provably lose no information, (d) they scale well to large data sets, and (e) they can be applied to any of the standard tree data structures. Experiments on real and realistic point-sets show that our algorithms are up to several orders of magnitude faster.
  • Keywords
    data structures; database management systems; geographic information systems; trees (mathematics); compact similarity join algorithms; geographical information systems; local data density; tree data structures; Astronomy; Character generation; Databases; Delay; Explosions; Information analysis; Information management; Information systems; Management information systems; Tree data structures;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497443
  • Filename
    4497443