DocumentCode
1087
Title
λ-diverse nearest neighbors browsing for multidimensional data
Author
Kucuktunc, Onur ; Ferhatosmanoglu, Hakan
Author_Institution
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Volume
25
Issue
3
fYear
2013
fDate
Mar-13
Firstpage
481
Lastpage
493
Abstract
Traditional search methods try to obtain the most relevant information and rank it according to the degree of similarity to the queries. Diversity in query results is also preferred by a variety of applications since results very similar to each other cannot capture all aspects of the queried topic. In this paper, we focus on the lambda-diverse k-nearest neighbor search problem on spatial and multidimensional data. Unlike the approach of diversifying query results in a postprocessing step, we naturally obtain diverse results with the proposed geometric and index-based methods. We first make an analogy with the concept of Natural Neighbors (NatN) and propose a natural neighbor-based method for 2D and 3D data and an incremental browsing algorithm based on Gabriel graphs for higher dimensional spaces. We then introduce a diverse browsing method based on the distance browsing feature of spatial index structures, such as R-trees. The algorithm maintains a Priority Queue with mindivdist of the objects depending on both relevancy and angular diversity and efficiently prunes nondiverse items and nodes. We experiment with a number of spatial and high-dimensional data sets, including Factual´s (http://www.factual.com/) US points-of-interest data set of 13M entries. On the experimental setup, the diverse browsing method is shown to be more efficient (regarding disk accesses) than k-NN search on R-trees, and more effective (regarding Maximal Marginal Relevance (MMR)) than the diverse nearest neighbor search techniques found in the literature.
Keywords
computational geometry; query processing; queueing theory; relevance feedback; search problems; set theory; tree data structures; trees (mathematics); λ-diverse k-nearest neighbor search problem; λ-diverse nearest neighbors browsing; 2D data; 3D data; Factual US points-of-interest data set; Gabriel graph; MMR; R-trees; angular diversity; distance browsing feature; diverse browsing method; geometric methods; high-dimensional data sets; higher dimensional spaces; incremental browsing algorithm; index-based methods; information ranking; k-NN search; maximal marginal relevance; multidimensional data; natural neighbor-based method; object mindivdist; priority queue; spatial data sets; spatial index structures; topic query; Diversity methods; Information retrieval; Nearest neighbor searches; Query processing; Search methods; Search problems; Spatial databases; Diversity; Gabriel graph; angular similarity; diverse nearest neighbor search; natural neighbors;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2011.251
Filename
6095558
Link To Document