Title :
Managing Soft-Errors in Transactional Systems
Author :
Mohamedin, Mohamed ; Palmieri, Roberto ; Ravindran, Binoy
Author_Institution :
Electr. & Comput. Eng. Dept., Virginia Tech, Blacksburg, VA, USA
Abstract :
Multicore architectures are becoming increasingly prone to soft-errors - i.e., transient faults caused by external physical phenomena such as electric noise and cosmic particle strikes. With increasing core counts, the soft-error rate is growing due to the accelerating transistor density on chips. The impact of these errors on business-critical applications that are being deployed on multicore hardware can be significant. We present an active replication-based approach that fully masks such errors for transactional applications. We partition computational cores, fully replicate objects across partitions, and concurrently execute transactional requests on all partitions, thereby enabling completely local object accesses. Transactional requests are globally ordered and delivered across partitions using optimistic atomic broadcast. Hardware message passing -- an important emerging trend in multicore architectures -- is exploited to mitigate communication costs. We report preliminary results obtained with an implementation of our approach on a 36-core Tilera TILE-Gx hardware, with an on-chip scalable mesh network.
Keywords :
computer architecture; concurrency control; multiprocessing systems; radiation hardening (electronics); Tilera TILE-Gx hardware; active replication-based approach; business-critical applications; communication cost mitigation; computational core partitioning; concurrent transactional request execution; core counts; cosmic particle; electric noise; error masking; external physical phenomena; globally delivered transactional requests; globally ordered transactional requests; hardware message passing; local object access; multicore architectures; multicore hardware; object replication; on-chip scalable mesh network; optimistic atomic broadcast; soft-error management; soft-error rate; transactional applications; transactional systems; transient faults; transistor density; Concurrency control; Hardware; Message systems; Multicore processing; Protocols; Throughput; Active Replication; Soft Errors; Transaction Processing;
Conference_Titel :
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-4117-9
DOI :
10.1109/IPDPSW.2014.148