The Berkeley Data Analytics Stack: Present and future

Author

Franklin, Matthew

Author_Institution

UC Berkeley, Berkeley, CA, USA

fYear

2013

fDate

6-9 Oct. 2013

Firstpage

Lastpage

Abstract

The Berkeley AMPLab was founded on the idea that the challenges of emerging Big Data applications requires a new approach to analytics systems. Launching in early 2011, the project set out to rethink the traditional analytics stack, breaking down technical and intellectual barriers that had arisen during decades of evolutionary development. The vision of the lab is to seamlessly integrate the three main resources available for making sense of data at scale: Algorithms (such as machine learning and statistical techniques), Machines (in the form of scalable clusters and elastic cloud computing), and People (both individually as analysts and en masse, as with crowdsourced human computation). To pursue this goal, we assembled a research team with diverse interests across computer science, forged relationships with domain experts on campus and elsewhere, and obtained the support of leading industry partners and major government sponsors. The lab is realizing its ideas through the development of a freely-available Open Source software stack called BDAS: the Berkeley Data Analytics Stack. In the nearly three years the lab has been in operation, we\´ve released major components of BDAS. Several of these components have gained significant traction in industry and elsewhere: the Mesos cluster resource manager, the Spark inmemory computation framework, and the Shark query processing system. BDAS shows up prominently in many industry discussions of the future of the Big Data analytics ecosystem - a rare degree of impact for an ongoing academic project. Given this initial success, the lab is continuing on its research path, moving "up the stack" to better integrate and support deep machine learning and to make people a full-fledged resource for making sense of Big Data. In this talk, I\´ll first outline the motivation and insights behind our research approach and describe how we have organized to address the cross-disciplinary nature of Big Data challenges. I will then describe- the current state of BDAS with an emphasis on the key components listed above and will address our current efforts on machine learning scalability and ease of use, and hybrid human/computer processing. Finally I will present our current views of how all the pieces will fit together to form a system that can adaptively bring the right resources to bear on a given data-driven question to meet time, cost and quality requirements throughout the analytics lifecycle.

Keywords

Big Data; cloud computing; learning (artificial intelligence); Berkeley data analytics stack; Mesos cluster resource manager; Shark query processing system; Spark inmemory computation framework; elastic cloud computing; evolutionary development; hybrid human-computer processing; machine learning; ppen source software stack; scalable cluster; statistical technique;

fLanguage

English

Publisher

ieee

Conference_Titel

Big Data, 2013 IEEE International Conference on

Conference_Location

Silicon Valley, CA

Type

conf

DOI

10.1109/BigData.2013.6691545

Filename

6691545

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=659396