Title :
Top-K range-aggregate queries on categorical data
Author :
Sanyal, Biswajit ; Gupta, Prosenjit ; Majumder, Subhashis
Author_Institution :
Dept. of Inf. Technol., Gov. Coll. of Eng. & Textile Technol., Serampore, India
Abstract :
Efficient processing of top-K queries is a crucial requirement in many interactive applications that deals with huge amount of data. In particular efficient top-K processing has shown a great impact on performance in domains such as the web, text and data integration, business analytics, distributed aggregation of network logs and sensor data, data mining and so on. In this work we focus on some top-K problems in Computational Geometry. We consider a set of problems defined on colored geometric objects (points/intervals) in R1. We are given a set S of n colored geometric objects in R1. Optionally, each object p has a weight w(p) ≥ 0. The number of colors is m ≤ n. For any color c and a query q, let f (c, q) be an aggregation function defined over c-colored objects intersecting q. For a given k ∈ [1 ... n], we define top-K(q) to be the set of k colors with the highest k f(c, q) values amongst the distinct colors of the objects intersecting in q. (If the number of distinct colors in q is less than k, we simply include all such colors in top-K(q)). Our goal is to preprocess S into a data structure so that given a query q and an integer k [1 ... n], the colors in top-K(q) can be reported efficiently. Efficient solutions are provided to instances of the above general problem in cases where (i) S is a set of colored points, q is a query interval and f(c, q) is the maximum/minimum weight of points of color c in q and (ii) S is a set of colored intervals, q is a query point and f(c, q) is the count of intervals of color c stabbed by q. We use techniques from Computational Geometry to solve these problems.
Keywords :
data handling; query processing; business analytics; categorical data; computational geometry; data integration; data mining; data structure; distributed aggregation; interactive applications; network logs; sensor data; text integration; top-k range-aggregate queries; web integration; Geometry; Image color analysis; colored range searching; computational geometry; persistent data structure; range-aggregrate queries; top-k problems;
Conference_Titel :
Emerging Trends and Applications in Computer Science (NCETACS), 2012 3rd National Conference on
Conference_Location :
Shillong
Print_ISBN :
978-1-4577-0749-0
DOI :
10.1109/NCETACS.2012.6203314