• DocumentCode
    167549
  • Title

    Managing Soft-Errors in Transactional Systems

  • Author

    Mohamedin, Mohamed ; Palmieri, Roberto ; Ravindran, Binoy

  • Author_Institution
    Electr. & Comput. Eng. Dept., Virginia Tech, Blacksburg, VA, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    1324
  • Lastpage
    1329
  • Abstract
    Multicore architectures are becoming increasingly prone to soft-errors - i.e., transient faults caused by external physical phenomena such as electric noise and cosmic particle strikes. With increasing core counts, the soft-error rate is growing due to the accelerating transistor density on chips. The impact of these errors on business-critical applications that are being deployed on multicore hardware can be significant. We present an active replication-based approach that fully masks such errors for transactional applications. We partition computational cores, fully replicate objects across partitions, and concurrently execute transactional requests on all partitions, thereby enabling completely local object accesses. Transactional requests are globally ordered and delivered across partitions using optimistic atomic broadcast. Hardware message passing -- an important emerging trend in multicore architectures -- is exploited to mitigate communication costs. We report preliminary results obtained with an implementation of our approach on a 36-core Tilera TILE-Gx hardware, with an on-chip scalable mesh network.
  • Keywords
    computer architecture; concurrency control; multiprocessing systems; radiation hardening (electronics); Tilera TILE-Gx hardware; active replication-based approach; business-critical applications; communication cost mitigation; computational core partitioning; concurrent transactional request execution; core counts; cosmic particle; electric noise; error masking; external physical phenomena; globally delivered transactional requests; globally ordered transactional requests; hardware message passing; local object access; multicore architectures; multicore hardware; object replication; on-chip scalable mesh network; optimistic atomic broadcast; soft-error management; soft-error rate; transactional applications; transactional systems; transient faults; transistor density; Concurrency control; Hardware; Message systems; Multicore processing; Protocols; Throughput; Active Replication; Soft Errors; Transaction Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
  • Conference_Location
    Phoenix, AZ
  • Print_ISBN
    978-1-4799-4117-9
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2014.148
  • Filename
    6969532