• DocumentCode
    3739723
  • Title

    Impact of HDFS block size in MapReduce based segmentation and feature extraction algorithm

  • Author

    Hameeza Ahmed;Muhammad Ali Ismail

  • Author_Institution
    High Performance Computing Centre (HPCC) Department of Computer & Information Systems Engineering, NED University of Engineering & Technology, University Road, Karachi-75270, Pakistan
  • fYear
    2015
  • Firstpage
    58
  • Lastpage
    63
  • Abstract
    Apache Hadoop is one of the open source frameworks to process big data. Despite the huge success of Hadoop, it lacks in addressing many of the significant real world problems. And the inability to handle data dependencies efficiently is one of the major issues among those. This paper highlights the data dependency issue in Hadoop framework. This is done using a newly developed MapReduce based segmentation & feature extraction algorithm for very large data dependencies dataset. Data dependency is managed by the variation in HDFS block size. With smaller block size, having larger data dependency and greater parallelism the framework shows uncertain results. As block size is increased, that is by minimizing the data dependency the framework starts to show result stability against the loss of parallelism.
  • Keywords
    "Feature extraction","Programming","Mathematical model","Parallel processing","Distributed databases","Tuning","Algorithm design and analysis"
  • Publisher
    ieee
  • Conference_Titel
    Open Source Systems & Technologies (ICOSST), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/ICOSST.2015.7396403
  • Filename
    7396403