Impact of HDFS block size in MapReduce based segmentation and feature extraction algorithm

Author

Hameeza Ahmed;Muhammad Ali Ismail

Author_Institution

High Performance Computing Centre (HPCC) Department of Computer & Information Systems Engineering, NED University of Engineering & Technology, University Road, Karachi-75270, Pakistan

fYear

2015

Firstpage

58

Lastpage

63

Abstract

Apache Hadoop is one of the open source frameworks to process big data. Despite the huge success of Hadoop, it lacks in addressing many of the significant real world problems. And the inability to handle data dependencies efficiently is one of the major issues among those. This paper highlights the data dependency issue in Hadoop framework. This is done using a newly developed MapReduce based segmentation & feature extraction algorithm for very large data dependencies dataset. Data dependency is managed by the variation in HDFS block size. With smaller block size, having larger data dependency and greater parallelism the framework shows uncertain results. As block size is increased, that is by minimizing the data dependency the framework starts to show result stability against the loss of parallelism.

Keywords

"Feature extraction","Programming","Mathematical model","Parallel processing","Distributed databases","Tuning","Algorithm design and analysis"

Publisher

ieee

Conference_Titel

Open Source Systems & Technologies (ICOSST), 2015 International Conference on

Type

conf

DOI

10.1109/ICOSST.2015.7396403

Filename

7396403