DocumentCode :
2548141
Title :
A compression-based technique for comparing biological sequences
Author :
Mina, Ramez ; Ali, Hesham H.
Author_Institution :
Coll. of Inf. Sci. & Technol., Univ. of Nebraska at Omaha, Omaha, NE, USA
fYear :
2010
fDate :
16-18 Dec. 2010
Firstpage :
94
Lastpage :
97
Abstract :
Comparing biological sequences represents one of the most important tools in computational biology. By comparing the sequences, we identify similar subsequences which may lead to the identification of structures as well as similar functions. Sequence alignment has been the method of choice for testing similarity and gained a lot of trust among researchers, though this method suffers some shortcomings. In particular, having repetitions in the input sequences often leads to inaccurate results, especially if these repetitions are dispersed overall the sequence. In this paper, we are conducting a study of alternative methods based on compression techniques, borrowed from information theory, to identify accurate comparison of the sequences. We test the proposed technique on various datasets and illustrate that they outperform alignment based methods in several cases.
Keywords :
DNA; bioinformatics; data compression; information theory; molecular biophysics; biological sequences; compression-based technique; computational biology; information theory; input sequences repetition; sequence alignment; Biomedical engineering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Biomedical Engineering Conference (CIBEC), 2010 5th Cairo International
Conference_Location :
Cairo
ISSN :
2156-6097
Print_ISBN :
978-1-4244-7168-3
Type :
conf
DOI :
10.1109/CIBEC.2010.5716047
Filename :
5716047
Link To Document :
بازگشت