مرکز منطقه ای اطلاع رساني علوم و فناوري - A memory capacity model for high performing data-filtering applications in Samza framework

DocumentCode :

3717433

Title :

A memory capacity model for high performing data-filtering applications in Samza framework

Author :

Tao Feng;Zhenyun Zhuang;Yi Pan;Haricharan Ramachandra

Author_Institution :

LinkedIn Corp 2029 Stierlin Court Mountain View, CA 94043, USA

fYear :

2015

Firstpage :

2600

Lastpage :

2605

Abstract :

Data quality is essential in big data paradigm as poor data can have serious consequences when dealing with large volumes of data. While it is trivial to spot poor data for small-scale and offline use cases, it is challenging to detect and fix data inconsistency in large-scale and online (real-time or near-real time) big data context. An example of such scenario is spotting and fixing poor data using Apache Samza, a stream processing framework that has been increasingly adopted to process near-real-time data at LinkedIn. To optimize the deployment of Samza processing and reduce business cost, in this work we propose a memory capacity model for Apache Samza to allow denser deployments of high performing data-filtering applications built on Samza. The model can be used to provision just-enough memory resource to applications by tightening the bounds on the memory allocations. We apply our memory capacity model on Linkedln´s real use cases in production, which significantly increases the deployment density and saves business costs. We will share key learning in this paper.

Keywords :

"Containers","LinkedIn","Data models","Big data","Java","Measurement","Real-time systems"

Publisher :

ieee

Conference_Titel :

Big Data (Big Data), 2015 IEEE International Conference on

Type :

conf

DOI :

10.1109/BigData.2015.7364058

Filename :

7364058

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3717433