Title :
Mars: Real-time spatio-temporal queries on microblogs
Author :
Magdy, Ahmed ; Aly, Ahmed M. ; Mokbel, Mohamed F. ; Elnikety, Sameh ; Yuxiong He ; Nath, Siddhartha
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Minnesota, Minneapolis, MN, USA
fDate :
March 31 2014-April 4 2014
Abstract :
Mars demonstration exploits the microblogs location information to support a wide variety of important spatio-temporal queries on microblogs. Supported queries include range, nearest-neighbor, and aggregate queries. Mars works under a challenging environment where streams of microblogs are arriving with high arrival rates. Mars distinguishes itself with three novel contributions: (1) Efficient in-memory digestion/expiration techniques that can handle microblogs of high arrival rates up to 64,000 microblog/sec. This also includes highly accurate and efficient hopping-window based aggregation for incoming microblogs keywords. (2) Smart memory optimization and load shedding techniques that adjust in-memory contents based on the expected query load to trade off a significant storage savings with a slight and bounded accuracy loss. (3) Scalable real-time query processing, exploiting Zipf distributed microblogs data for efficient top-k aggregate query processing. In addition, Mars employs a scalable real-time nearest neighbor and range query processing module that employs various pruning techniques so that it serves heavy query workloads in real time. Mars is demonstrated using a stream of real tweets obtained from Twitter firehose with a production query workload obtained from Bing web search. We show that Mars serves incoming queries with an average latency of less than 4 msec and with 99% answer accuracy while saving up to 70% of storage overhead for different query loads.
Keywords :
Internet; query processing; social networking (online); Bing Web search; Mars; Twitter firehose; Zipf distributed microblogs data; aggregate queries; heavy query workloads; hopping-window based aggregation; in-memory digestion-expiration techniques; load shedding techniques; microblogs keywords; microblogs location information; nearest-neighbor queries; production query workload; pruning techniques; range queries; real-time query processing; real-time spatio-temporal queries; smart memory optimization; top-k aggregate query processing; Aggregates; Indexes; Mars; Memory management; Query processing; Real-time systems; Twitter;
Conference_Titel :
Data Engineering (ICDE), 2014 IEEE 30th International Conference on
Conference_Location :
Chicago, IL
DOI :
10.1109/ICDE.2014.6816750