• DocumentCode
    2334760
  • Title

    A user-level library for fault tolerance on shared memory multicore systems

  • Author

    Mushtaq, Hamid ; Al-Ars, Zaid ; Bertels, Koen

  • Author_Institution
    Comput. Eng. Lab., Delft Univ. of Technol., Delft, Netherlands
  • fYear
    2012
  • fDate
    18-20 April 2012
  • Firstpage
    266
  • Lastpage
    269
  • Abstract
    The ever decreasing transistor size has made it possible to integrate multiple cores on a single die. On the downside, this has introduced reliability concerns as smaller transistors are more prone to both transient and permanent faults. However, the abundant extra processing resources of a multicore system can be exploited to provide fault tolerance by using redundant execution. We have designed a library for multicore processing, that can make a multithreaded user-level application fault tolerant by simple modifications to the code. It uses the abundant cores found in the system to perform redundant execution for error detection. Besides that, it also allows recovery through checkpoint/rollback. Our library is portable since it does not depend on any special hardware. Furthermore, the overhead (up to 46% for 4 threads), our library adds to the original application, is less than other existing approaches, such as Respec.
  • Keywords
    checkpointing; fault tolerant computing; libraries; multi-threading; redundancy; shared memory systems; checkpoint-rollback; error detection; multicore processing; multithreaded user-level application fault tolerance; redundant execution; reliability concerns; shared memory multicore systems; user-level library; Benchmark testing; Fault tolerance; Fault tolerant systems; Instruction sets; Libraries; Memory management; Multicore processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Design and Diagnostics of Electronic Circuits & Systems (DDECS), 2012 IEEE 15th International Symposium on
  • Conference_Location
    Tallinn
  • Print_ISBN
    978-1-4673-1187-8
  • Electronic_ISBN
    978-1-4673-1186-1
  • Type

    conf

  • DOI
    10.1109/DDECS.2012.6219071
  • Filename
    6219071