DocumentCode :
3714361
Title :
A new algorithm for “the LCS problem” with application in compressing genome resequencing data
Author :
Richard Beal;Tazin Afrin;Aliya Farheen;Don Adjeroh
Author_Institution :
Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, US
fYear :
2015
Firstpage :
69
Lastpage :
74
Abstract :
The longest common subsequence (LCS) problem is a classical problem in computer science, and forms the basis of the current best-performing reference-based compression schemes for genome resequencing data. First, we present a new algorithm for the LCS problem. Then, we introduce an LCS-motivated reference-based compression scheme using the components of the LCS, rather than the LCS itself. For the Homo sapiens genome (original size 3,080,436,051 bytes), our proposed scheme compressed the genome to 5,267,656 bytes. This can be compared with the previous best results of 19,666,791 bytes (Wang and Zhang, 2011) and 17,971,030 bytes (Pinho, Pratas, and Garcia, 2011). Thus, our compression ratio is about 3.73 to 3.41 times better than those from the state-of-the-art reference-based compression algorithms.
Keywords :
"Lead","Genomics","Bioinformatics","Yttrium"
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/BIBM.2015.7359657
Filename :
7359657
Link To Document :
بازگشت