• DocumentCode
    3697082
  • Title

    A Grid-Based k-Nearest Neighbor Join for Large Scale Datasets on MapReduce

  • Author

    Miyoung Jang;Young-Sung Shin;Jae-Woo Chang

  • Author_Institution
    Dept. of Comput. Eng., Chonbuk Nat. Univ., Jeonju, South Korea
  • fYear
    2015
  • Firstpage
    888
  • Lastpage
    891
  • Abstract
    Because MapReduce supports efficient parallel data processing, MapReduce-based query processing algorithms have been widely studied. Among various query types, k-nearest neighbor join, which aims to produce the k nearest neighbors of each point of a dataset from another dataset, has been considered most important in data analysis. Existing k-NN join query processing algorithms on MapReduce suffer from high index construction and computation costs which make them unsuitable for big data processing. In this paper, we propose a new grid-based k-NN join query processing algorithm on MapReduce. First, we design a dynamic grid index that represents the distribution of join datasets. Based on this index, we prune out unnecessary cells for the join with the distance-based filtering. This can reduce the data transmission and computation overheads. From performance analysis, we show that our algorithm outperforms the existing scheme up to seven times in terms of query processing time while achieving high query result accuracy.
  • Keywords
    "Partitioning algorithms","Algorithm design and analysis","Query processing","Indexes","Data processing","Heuristic algorithms","Accuracy"
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on
  • Type

    conf

  • DOI
    10.1109/HPCC-CSS-ICESS.2015.189
  • Filename
    7336277