DocumentCode :
2970581
Title :
3D memory stacking for fast checkpointing/restore applications
Author :
Xie, Jing ; Dong, Xiangyu ; Xie, Yuan
Author_Institution :
Comput. Sci. & Eng. Dept., Pennsylvania State Univ., University Park, PA, USA
fYear :
2010
fDate :
16-18 Nov. 2010
Firstpage :
1
Lastpage :
6
Abstract :
As technology scales, modern massive parallel processing (MPP) systems are facing large system overhead caused by high failure rates. To provide the system-level fault tolerance, the traditional in-disk checkpointing/restart schemes are usually adopted by periodically dumping system states and memory contents to hard disk drives (HDDs). When errors occur, the system can be restored by reading checkpoints from HDDs. The low bandwidth and slow speed of HDDs are now becoming the major bottleneck for the MPP system performance. Consequently, novel checkpointing schemes are need to facilitate the move from Petascale computing to Exascale computing. We have proposed a 3D memory stacking method that leverage the massive number of TSVs between memory layers to help high-bandwidth checkpointing/restore. To validate the proposed scheme, we design a 2-layer TSV-based SRAM/SRAM 3D-stacked chip to mimic the high-bandwidth and fast data transfer from one memory layer to another memory layer, so that the inmemory checkpointing/restartrestore scheme can be enabled for the future exascale computing. The capacity of each SRAM layer is 1Mbit. Each layer contains 64 banks, with each bank contains 256 words and the word length is 64-bit. The final footprint including I/O pad is 2.9 mm × 2 mm. The SRAM dies were taped out in GlobalFoundries using its 130 nm low power process, and the 3D stacking was done by using Tezzaron´s TSV technology. The prototyping chip can perform checkpointing/restart at the speed of 4 K/cycle with 1 Ghz clock.
Keywords :
SRAM chips; checkpointing; disc drives; fault tolerance; hard discs; integrated circuit testing; low-power electronics; multiprocessing systems; parallel processing; three-dimensional integrated circuits; 2-layer TSV-based SRAM/SRAM 3D-stacked chip; 3D memory stacking; I/O pad; MPP system performance; SRAM dies; TSV technology; data transfer; exascale computing; hard disk drive; high failure rate; in-disk checkpointing; low bandwidth; low power process; massive parallel processing; memory layer; petascale computing; restart scheme; restart-restore scheme; size 130 nm; slow speed; system overhead; system-level fault tolerance; word length 64 bit; Bandwidth; Checkpointing; Phase change random access memory; Stacking; Three dimensional displays; Through-silicon vias;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
3D Systems Integration Conference (3DIC), 2010 IEEE International
Conference_Location :
Munich
Print_ISBN :
978-1-4577-0526-7
Type :
conf
DOI :
10.1109/3DIC.2010.5751466
Filename :
5751466
Link To Document :
بازگشت