• DocumentCode
    3323530
  • Title

    A Fast Similarity Join Algorithm Using Graphics Processing Units

  • Author

    Lieberman, Michael D. ; Sankaranarayanan, Jagan ; Samet, Hanan

  • Author_Institution
    Dept. of Comput. Sci., Maryland Univ., College Park, MD
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1111
  • Lastpage
    1120
  • Abstract
    A similarity join operation A BOWTIEepsiv B takes two sets of points A, B and a value epsiv isin Ropf, and outputs pairs of points p isin A,q isin B, such that the distance D(p, q) les epsiv. Similarity joins find use in a variety of fields, such as clustering, text mining, and multimedia databases. A novel similarity join algorithm called LSS is presented that executes on a graphics processing unit (GPU), exploiting its parallelism and high data throughput. As GPUs only allow simple data operations such as the sorting and searching of arrays, LSS uses these two operations to cast a similarity join operation as a GPU sort-and-search problem. It first creates, on the fly, a set of space-filling curves on one of its input datasets, using a parallel GPU sort routine. Next, LSS processes each point p of the other dataset in parallel. For each p, it searches an interval of one of the space-filling curves guaranteed to contain all the pairs in which p participates. Using extensive theoretical and experimental analysis, LSS is shown to offer a good balance between time and work efficiency. Experimental results demonstrate that LSS is suitable for similarity joins in large high-dimensional datasets, and that it performs well when compared against two existing prominent similarity join methods.
  • Keywords
    parallel algorithms; quadtrees; set theory; sorting; arrays searching; fast similarity join algorithm; graphics processing units; sort-and-search problem; space-filling curves; Arithmetic; Automation; Computer graphics; Computer science; Educational institutions; Extraterrestrial measurements; Modems; Multimedia databases; Text mining; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497520
  • Filename
    4497520