Title :
Towards Efficient KNN Joins on Data Streams
Author :
Chong Yang ; Xiaohui Yu ; Yang Liu
Author_Institution :
Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan, China
fDate :
June 27 2014-July 2 2014
Abstract :
We study the problem of efficient processing of kNN joins over high-dimensional data streams, which is an operation required by many big data applications. Specifically, we are concerned with the continuous evaluation of a set of k nearest neighbor queries Q on streams of high-dimensional items at consecutive snapshots of those streams. While one possible solution is to evaluate the kNN joins starting from scratch at each snapshot, it is too expensive for large volumes of data we encounter in big data applications. We consider the data stream on a time window and maintain the join results for Q at every snapshot in main memory. Our approach to this problem is to build indexes on Q, and only update the results of the queries affected by the changes in the streams at each snapshot. We propose a main-memory structure called the High-dimensional R-tree (HDR-tree) to index the queries, which is efficient in finding affected queries with reasonable maintenance cost. HDR-tree takes advantage of the benefit of clustering and the principle component analysis (PCA) technique. Preliminary experimental results show that our index structures significantly outperform baseline methods.
Keywords :
Big Data; indexing; pattern clustering; principal component analysis; query processing; tree data structures; HDR-tree; PCA technique; big data applications; clustering; high-dimensional R-tree; high-dimensional data streams; index structures; k nearest neighbor queries; kNN joins processing; main-memory structure; maintenance cost; principle component analysis; time window; Algorithm design and analysis; Big data; Clustering algorithms; Educational institutions; Indexes; Maintenance engineering; Principal component analysis; data stream; high dimensional data; k nearest neighbor join;
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
DOI :
10.1109/BigData.Congress.2014.121