DocumentCode :
2319728
Title :
An efficient machine learning approach to low-complexity filtering in biological sequences
Author :
Barber, Christopher A. ; Oehmen, Christopher S.
Author_Institution :
Pacific Northwest Nat. Lab., Richland, WA, USA
fYear :
2012
fDate :
9-12 May 2012
Firstpage :
237
Lastpage :
243
Abstract :
Biological sequences contain low-complexity regions (LCRs) which produce superfluous matches in homology searches, and lead to slow execution of database search algorithms such as BLAST. These regions are efficiently identified by low-complexity filtering algorithms such as SDUST and SEG, which are included in the BLAST tool-suite. These algorithms target differing notions of complexity, so an algorithm which combines their sensitivities is pursued. A variety of features are derived from these algorithms, as well as a new filtering algorithm based on Lempel-Ziv complexity. Artificial sequences with known LCRs are used to train and evaluate an SVM classifier, which significantly outperforms the standalone filtering algorithms.
Keywords :
bioinformatics; biological techniques; learning (artificial intelligence); molecular biophysics; search problems; support vector machines; Lempel-Ziv complexity based filtering algorithm; SDUST; SEG; SVM classifier; biological sequences; database search algorithms; homology searches; low complexity filtering algorithms; low complexity regions; machine learning approach; superfluous matches; support vector machine; Accuracy; Complexity theory; DNA; Entropy; Markov processes; Proteins; Support vector machines; bioinformatics; complexity measures; filtering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2012 IEEE Symposium on
Conference_Location :
San Diego, CA
Print_ISBN :
978-1-4673-1190-8
Type :
conf
DOI :
10.1109/CIBCB.2012.6217236
Filename :
6217236
Link To Document :
بازگشت