• DocumentCode
    3495607
  • Title

    A genetic-based optimal checkpoint placement strategy for multicore processors

  • Author

    Lotfi, Atieh ; Safari, Saeed

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Univ. of Tehran, Tehran, Iran
  • fYear
    2012
  • fDate
    2-3 May 2012
  • Firstpage
    172
  • Lastpage
    177
  • Abstract
    Nowadays multicore processors are increasingly being deployed in high performance computing systems. As the complexity of systems increases, the probability of failure increases substantially. Therefore, the system requires techniques for supporting fault tolerance. Checkpointing technique is widely used to reduce the execution time of long-running programs in the presence of failures and to enhance the reliability of such systems. Optimizing the number of checkpoints in a parallel application running on a multicore processor is a complicated and challenging task. Infrequent checkpointing results in long reprocessing time, while too short checkpointing intervals lead to high checkpointing overhead. Since this is a multi-objective optimization problem, trapping in local optimums is very plausible. On the other hand, bio-inspired algorithms are powerful function optimizers that are successfully used to solve problems in many different areas. In this paper, by applying genetic algorithm, which is a well-known bio-inspired computing algorithm, finding optimal checkpoint placement in parallel applications is exercised. Under certain fault conditions, this new checkpoint placement strategy outperforms the existing ones with a significant reduction in the total wasted times. Our experimental results show that our method, which is implementable on any message-passing multicore system, can optimally find the suitable points in which checkpoints should be taken.
  • Keywords
    checkpointing; genetic algorithms; message passing; multiprocessing systems; probability; bio-inspired algorithm; checkpointing overhead; checkpointing technique; failure probability; fault tolerance; genetic algorithm; genetic-based optimal checkpoint placement strategy; high performance computing system; message-passing multicore system; multicore processor; multiobjective optimization problem; parallel application; program execution time; reprocessing time; Benchmark testing; Biological cells; Checkpointing; Genetic algorithms; Genetics; Multicore processing; Program processors; Fault Tolerance; Genetic Algorithm; Multicore Architectures; Optimal Checkpoint Placement;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture and Digital Systems (CADS), 2012 16th CSI International Symposium on
  • Conference_Location
    Shiraz, Fars
  • Print_ISBN
    978-1-4673-1481-7
  • Type

    conf

  • DOI
    10.1109/CADS.2012.6316440
  • Filename
    6316440