DocumentCode
2008307
Title
Ensemble Machine Methods for DNA Binding
Author
Fan, Yue ; Kon, Mark A. ; DeLisi, Charles
Author_Institution
Dept. of Math. & Stat., Boston Univ., Boston, MA, USA
fYear
2008
fDate
11-13 Dec. 2008
Firstpage
709
Lastpage
716
Abstract
We introduce three ensemble machine learning methods for analysis of biological DNA binding by transcription factors (TFs). The goal is to identify both TF target genes and their binding motifs. Subspace-valued weak learners (formed from an ensemble of different motif finding algorithms) combine candidate motifs as probability weight matrices (PWM), which are then translated into subspaces of a DNA k-mer (string) feature space. Assessing and then integrating highly informative subspaces by machine methods gives more reliable target classification and motif prediction. We compare these target identification methods with probability weight matrix (PWM) rescanning and use of support vector machines on the full k-mer space of the yeast S. cerevisiae. This method, SVMotif-PWM, can significantly improve accuracy in computational identification of TF targets. The software is publicly available at http://cagt10.bu.edu/SVMotif .
Keywords
DNA; biology computing; genetics; learning (artificial intelligence); matrix algebra; pattern classification; probability; biological DNA binding analysis; ensemble machine learning method; motif prediction; probability weight matrix; target classification; transcription factor target gene; Bioinformatics; DNA; Genomics; Learning systems; Machine learning; Mathematics; Pulse width modulation; Sequences; Statistics; Systems biology; DNA; bioinformatics; ensembles; machine learning; transcription factor;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
Conference_Location
San Diego, CA
Print_ISBN
978-0-7695-3495-4
Type
conf
DOI
10.1109/ICMLA.2008.114
Filename
4725053
Link To Document