Title :
Relevance Matters: Capitalizing on Less (Top-k Matching in Publish/Subscribe)
Author :
Sadoghi, Mohammad ; Jacobsen, Hans-Arno
Author_Institution :
Dept. of Comput. Sci., Univ. of Toronto, Toronto, ON, Canada
Abstract :
The efficient processing of large collections of Boolean expressions plays a central role in major data intensive applications ranging from user-centric processing and personalization to real-time data analysis. Emerging applications such as computational advertising and selective information dissemination demand determining and presenting to an end-user only the most relevant content that is both user-consumable and suitable for limited screen real estate of target devices. To retrieve the most relevant content, we present BE*-Tree, a novel indexing data structure designed for effective hierarchical top-k pattern matching, which as its by-product also reduces the operational cost of processing millions of patterns. To further reduce processing cost, BE*-Tree employs an adaptive and non-rigid space-cutting technique designed to efficiently index Boolean expressions over a high-dimensional continuous space. At the core of BE*-Tree lie two innovative ideas: (1) a bi-directional tree expansion build as a top-down (data and space clustering) and a bottom-up growths (space clustering), which together enable indexing only non-empty continuous sub-spaces, and (2) an overlap-free splitting strategy. Finally, the performance of BE*-Tree is proven through a comprehensive experimental comparison against state-of-the-art index structures for matching Boolean expressions.
Keywords :
Boolean algebra; indexing; message passing; middleware; pattern matching; tree data structures; BE*-tree; Boolean expression collection processing; Boolean expression matching; bidirectional tree expansion; bottom-up growths; data intensive applications; hierarchical top-k pattern matching; indexing data structure; less-top-k matching capitalization; nonempty continuous subspaces; overlap-free splitting strategy; personalization; publish-subscribe; real-time data analysis; relevance matters; space-cutting technique; user-centric processing; Computational modeling; Data models; Heuristic algorithms; Indexing; Subscriptions;
Conference_Titel :
Data Engineering (ICDE), 2012 IEEE 28th International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4673-0042-1
DOI :
10.1109/ICDE.2012.38