• DocumentCode
    2984038
  • Title

    Adaptive Failure Detection via Heartbeat under Hadoop

  • Author

    Zhu, Hao ; Chen, Haopeng

  • Author_Institution
    Sch. of Software, Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2011
  • fDate
    12-15 Dec. 2011
  • Firstpage
    231
  • Lastpage
    238
  • Abstract
    Hadoop has become one popular framework to process massive data sets in a large scale cluster. However, it is observed that the detection of the failed worker is delayed, which may result in a significant increase in the completion time of jobs with different workload. To cope with it, we present two mechanisms: Adaptive interval and Reputation-based Detector that support Hadoop to detect the failed worker in the shortest time. The Adaptive interval is trying to dynamically configure the expiration time which is adaptive to the job size. The Reputation-based Detector is trying to evaluate the reputation of each worker. Once the reputation of a worker is lower than a threshold, then the worker will be considered as a failed worker. In our experiments, we demonstrate that both of these strategies have achieved great improvement in the detection of the failed worker. Specifically, the Adaptive interval has a relatively better performance with small jobs, while the Reputation-based Detector is more suitable for large jobs.
  • Keywords
    distributed programming; software fault tolerance; Hadoop; adaptive failure detection; adaptive interval; failed worker; job size; large scale cluster; massive data set; reputation-based detector; Detectors; Educational institutions; Fault tolerance; Fault tolerant systems; Heart beat; Heart rate variability; Runtime; Cloud computing; Hadoop; MapReduce; adaptive heartbeat; failure detection;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Computing Conference (APSCC), 2011 IEEE Asia-Pacific
  • Conference_Location
    Jeju Island
  • Print_ISBN
    978-1-4673-0206-7
  • Type

    conf

  • DOI
    10.1109/APSCC.2011.46
  • Filename
    6127967