DocumentCode :
2998407
Title :
Investigating Memory Optimization of Hash-index for Next Generation Sequencing on Multi-core Architecture
Author :
Wang, Wendi ; Tang, Wen ; Li, Linchuan ; Tan, Guangming ; Zhang, Peiheng ; Sun, Ninghui
Author_Institution :
High Performance Comput. Res. Center, Inst. of Comput. Technol., Beijing, China
fYear :
2012
fDate :
21-25 May 2012
Firstpage :
665
Lastpage :
674
Abstract :
Next Generation Sequencing (NGS) is gaining interests due to the increased requirements and the decreased sequencing cost. The important and prerequisite step of most NGS applications is the mapping of short sequences, called reads, to the template reference sequences. Both the explosion of NGS data with over billions of reads generated each day and the data intensive computations pose great challenges to the capability of existing computing systems. In this paper, we take a hash index based algorithm (PerM) as an example to investigate the optimization approaches for accelerating NGS reads mapping on multi-core architectures. First, we propose a new parallel algorithm that reorders bucket access in hash index among multiple threads so that data locality in shared cache is improved. Second, in order to reduce the number of empty hash bucket, we propose a serialized hash index compression algorithm, which coincides with the sequential access nature of our new parallel algorithm. With reduced hash index size, it also becomes possible for us to use longer hash keys, which alleviates the hash conflicts and improves the query performance. Our experiment on an 8-socket 8-cores Intel Xeon X7550 SMP with 128 GB memory shows that the new parallel algorithm reduces LLC miss ratio to be 8%~15% of the original algorithm and the overall performance is improved by 4~11 times (6 times avg.).
Keywords :
cache storage; cryptography; data compression; microprocessor chips; multiprocessing systems; optimisation; parallel architectures; query processing; 8-socket 8-cores Intel Xeon X7550 SMP; LLC miss ratio; NGS applications; PerM; bucket access reorders; hash conflicts; hash keys; hash-index; memory optimization; multicore architecture; next generation sequencing; parallel algorithm; query performance; serialized hash index compression algorithm; shared cache; short sequence mapping; Algorithm design and analysis; Bioinformatics; Genomics; Indexes; Memory management; Optimization; Parallel processing; memory optimization; next-generation sequencing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-0974-5
Type :
conf
DOI :
10.1109/IPDPSW.2012.83
Filename :
6270705
Link To Document :
بازگشت