• DocumentCode
    3144196
  • Title

    An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments

  • Author

    Zhang, Xiaohong ; Feng, Yuhong ; Feng, Shengzhong ; Fan, Jianping ; Ming, Zhong

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Henan Polytech. Univ., Jiaozuo, China
  • fYear
    2011
  • fDate
    12-14 Dec. 2011
  • Firstpage
    235
  • Lastpage
    242
  • Abstract
    Data locality has recently been extensively exploited in Cloud computing to improve system performance. However, when schedule Map tasks in Hadoop MapReduce framework working in a heterogeneous environment, existing methods either cannot reduce the occurrence of these Map tasks or injure fairness, thus degrading the system performance. In order to address this problem, this paper proposes a data locality aware scheduling method to improve the Hadoop MapReduce system performance in heterogeneous computing environments. After receiving a request from a requesting node, our method preferentially schedules the task whose input data is stored on the requesting node. If no such tasks exist, our method will select the task whose input data is nearest to the requesting node, and then make a decision on whether to reserve the task for the node storing the input data or schedule the task to the requesting node by transferring the input data to the requesting node on the fly. As a proof of concept, we implement the method in Hadoop-0.20.2. In order to evaluate the performance, we carry out an experimental comparison study on our proposed method against the default scheduling method used in Hadoop-0.20.2. The experiment results show that our proposed method improves the data locality and reduces the normalized execution time as well as the response time of jobs.
  • Keywords
    cloud computing; distributed processing; scheduling; software performance evaluation; Hadoop MapReduce framework; Hadoop-0.20.2; cloud computing; data locality aware task scheduling method; default scheduling method; heterogeneous environments; performance evaluation; requesting node; system performance improvement; Cloud computing; Data communication; Distributed databases; Processor scheduling; Schedules; System performance; Time factors; Cloud Computing; Data Locality; Distributed Computing; MapReduce; Task Scheduling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud and Service Computing (CSC), 2011 International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4577-1635-5
  • Electronic_ISBN
    978-1-4577-1636-2
  • Type

    conf

  • DOI
    10.1109/CSC.2011.6138527
  • Filename
    6138527