Title :
A compression-based technique for comparing biological sequences
Author :
Mina, Ramez ; Ali, Hesham H.
Author_Institution :
Coll. of Inf. Sci. & Technol., Univ. of Nebraska at Omaha, Omaha, NE, USA
Abstract :
Comparing biological sequences represents one of the most important tools in computational biology. By comparing the sequences, we identify similar subsequences which may lead to the identification of structures as well as similar functions. Sequence alignment has been the method of choice for testing similarity and gained a lot of trust among researchers, though this method suffers some shortcomings. In particular, having repetitions in the input sequences often leads to inaccurate results, especially if these repetitions are dispersed overall the sequence. In this paper, we are conducting a study of alternative methods based on compression techniques, borrowed from information theory, to identify accurate comparison of the sequences. We test the proposed technique on various datasets and illustrate that they outperform alignment based methods in several cases.
Keywords :
DNA; bioinformatics; data compression; information theory; molecular biophysics; biological sequences; compression-based technique; computational biology; information theory; input sequences repetition; sequence alignment; Biomedical engineering;
Conference_Titel :
Biomedical Engineering Conference (CIBEC), 2010 5th Cairo International
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-7168-3
DOI :
10.1109/CIBEC.2010.5716047