DocumentCode :
3525323
Title :
AntsBOA: A New Time Series Pipeline for Big Data Processing, Analyzing and Querying in Online Advertising Application
Author :
Bin Song ; Shaosu Liu ; Kolay, Santanu ; Lo, Lawrence
Author_Institution :
Turn Inc., Redwood City, CA, USA
fYear :
2015
fDate :
March 30 2015-April 2 2015
Firstpage :
223
Lastpage :
232
Abstract :
This paper presents a new pipeline AntsBOA for big data analyzing, processing and querying. This pipeline is initially designed for online advertising application. However, it is easy to extend to other big data applications. The main idea is that AntsBOA is based on time series technology. The data processing of AntsBOA includes three levels, aggregation, time series and cache. Time series data and cache data are loading to a distributed database system, named Kodiak. Query server then queries these data in Kodiak and replies the result. This pipeline has been run in production for half a year. In our production, prior 16 months performance data is able to populate in less than half an hour. The response time of querying the 16 months performance data is less than several milliseconds in average. In addition, from our production results, cache level speeds up tens of times than aggregation level in term of query time. Time series cache level has a speedup 50% than cache level in term of Hadoop resource. And Time series loading performance speeds up about 10 times than traditional loading. Also our production system is monitored to guarantee in a healthy and stable state. In summary, AntsBOA is an efficient, accurate, recoverable, scalable and fault tolerant pipeline for big data processing, analyzing and querying.
Keywords :
Big Data; advertising data processing; cache storage; distributed databases; query processing; time series; AntsBOA; Big Data analyzing; Big Data applications; Big Data processing; Big Data querying; Kodiak; aggregation level; cache data; cache level; distributed database system; online advertising application; query server; time series data; time series technology; Advertising; Big data; Distributed databases; Pipelines; Servers; Time series analysis; Extract Transform Load (ETL); big data; distributed systems; online advertising; time series;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data Computing Service and Applications (BigDataService), 2015 IEEE First International Conference on
Conference_Location :
Redwood City, CA
Type :
conf
DOI :
10.1109/BigDataService.2015.32
Filename :
7184885
Link To Document :
بازگشت