مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

Title of article :

Improving I/O Efficiency in Hadoop-Based Massive Data Analysis Programs

Author/Authors :

Lee, Kyong-Ha 1 Research Data Hub Center - Korea Institute of Science and Technology Information, Daejeon, Republic of Korea , Kang,Woo Lam School of Computing, KAIST, Daejeon, Republic of Korea , Suh, Young-Kyoon School of Computer Science and Engineering - Kyungpook National University, Daegu, Republic of Korea

Pages :

From page :

To page :

Abstract :

Apache Hadoop has been a popular parallel processing tool in the era of big data. While practitioners have rewritten many conventional analysis algorithms to make them customized to Hadoop, the issue of inefficient I/O in Hadoop-based programs has been repeatedly reported in the literature. In this article, we address the problem of the I/O inefficiency in Hadoop-based massive data analysis by introducing our efficient modification of Hadoop. We first incorporate a columnar data layout into the conventional Hadoop framework, without any modification of the Hadoop internals. We also provide Hadoop with indexing capability to save a huge amount of I/O while processing not only selection predicates but also star-join queries that are often used in many analysis tasks.

Keywords :

Improving I/O , Data Analysis Programs , Hadoop-Based

Journal title :

Scientific Programming

Serial Year :

2018

Full Text URL :

downloads.hindawi.com/journals/sp/2018/2682085.pdf

Record number :

2608396

Link To Document :

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=2608396