DocumentCode
3739723
Title
Impact of HDFS block size in MapReduce based segmentation and feature extraction algorithm
Author
Hameeza Ahmed;Muhammad Ali Ismail
Author_Institution
High Performance Computing Centre (HPCC) Department of Computer & Information Systems Engineering, NED University of Engineering & Technology, University Road, Karachi-75270, Pakistan
fYear
2015
Firstpage
58
Lastpage
63
Abstract
Apache Hadoop is one of the open source frameworks to process big data. Despite the huge success of Hadoop, it lacks in addressing many of the significant real world problems. And the inability to handle data dependencies efficiently is one of the major issues among those. This paper highlights the data dependency issue in Hadoop framework. This is done using a newly developed MapReduce based segmentation & feature extraction algorithm for very large data dependencies dataset. Data dependency is managed by the variation in HDFS block size. With smaller block size, having larger data dependency and greater parallelism the framework shows uncertain results. As block size is increased, that is by minimizing the data dependency the framework starts to show result stability against the loss of parallelism.
Keywords
"Feature extraction","Programming","Mathematical model","Parallel processing","Distributed databases","Tuning","Algorithm design and analysis"
Publisher
ieee
Conference_Titel
Open Source Systems & Technologies (ICOSST), 2015 International Conference on
Type
conf
DOI
10.1109/ICOSST.2015.7396403
Filename
7396403
Link To Document