Author :
Fox, Geoffrey C. ; Qiu, Judy ; Kamburugamuve, Supun ; Jha, Shantenu ; Luckow, Andre
Author_Institution :
Sch. of Inf. & Comput., Indiana Univ. Bloomington, Bloomington, IN, USA
Abstract :
We review the High Performance Computing Enhanced Apache Big Data Stack HPC-ABDS and summarize the capabilities in 21 identified architecture layers. These cover Message and Data Protocols, Distributed Coordination, Security & Privacy, Monitoring, Infrastructure Management, DevOps, Interoperability, File Systems, Cluster & Resource management, Data Transport, File management, NoSQL, SQL (NewSQL), Extraction Tools, Object-relational mapping, In-memory caching and databases, Inter-process Communication, Batch Programming model and Runtime, Stream Processing, High-level Programming, Application Hosting and PaaS, Libraries and Applications, Workflow and Orchestration. We summarize status of these layers focusing on issues of importance for data analytics. We highlight areas where HPC and ABDS have good opportunities for integration.
Keywords :
Big Data; SQL; cache storage; data privacy; monitoring; open systems; parallel processing; security of data; Apache Big Data stack; DevOps; HPC-ABDS; NewSQL; NoSQL; batch programming model; data transport; distributed coordination; file management; file systems; high performance computing; in-memory caching; infrastructure management; interoperability; message and data protocols; monitoring; object-relational mapping; privacy; resource management; security; stream processing; Big data; Cloud computing; Distributed databases; Google; Programming; Security; Apache Big Data Stack; HPC;