DocumentCode :
3717433
Title :
A memory capacity model for high performing data-filtering applications in Samza framework
Author :
Tao Feng;Zhenyun Zhuang;Yi Pan;Haricharan Ramachandra
Author_Institution :
LinkedIn Corp 2029 Stierlin Court Mountain View, CA 94043, USA
fYear :
2015
Firstpage :
2600
Lastpage :
2605
Abstract :
Data quality is essential in big data paradigm as poor data can have serious consequences when dealing with large volumes of data. While it is trivial to spot poor data for small-scale and offline use cases, it is challenging to detect and fix data inconsistency in large-scale and online (real-time or near-real time) big data context. An example of such scenario is spotting and fixing poor data using Apache Samza, a stream processing framework that has been increasingly adopted to process near-real-time data at LinkedIn. To optimize the deployment of Samza processing and reduce business cost, in this work we propose a memory capacity model for Apache Samza to allow denser deployments of high performing data-filtering applications built on Samza. The model can be used to provision just-enough memory resource to applications by tightening the bounds on the memory allocations. We apply our memory capacity model on Linkedln´s real use cases in production, which significantly increases the deployment density and saves business costs. We will share key learning in this paper.
Keywords :
"Containers","LinkedIn","Data models","Big data","Java","Measurement","Real-time systems"
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BigData.2015.7364058
Filename :
7364058
Link To Document :
بازگشت