• DocumentCode
    2008307
  • Title

    Ensemble Machine Methods for DNA Binding

  • Author

    Fan, Yue ; Kon, Mark A. ; DeLisi, Charles

  • Author_Institution
    Dept. of Math. & Stat., Boston Univ., Boston, MA, USA
  • fYear
    2008
  • fDate
    11-13 Dec. 2008
  • Firstpage
    709
  • Lastpage
    716
  • Abstract
    We introduce three ensemble machine learning methods for analysis of biological DNA binding by transcription factors (TFs). The goal is to identify both TF target genes and their binding motifs. Subspace-valued weak learners (formed from an ensemble of different motif finding algorithms) combine candidate motifs as probability weight matrices (PWM), which are then translated into subspaces of a DNA k-mer (string) feature space. Assessing and then integrating highly informative subspaces by machine methods gives more reliable target classification and motif prediction. We compare these target identification methods with probability weight matrix (PWM) rescanning and use of support vector machines on the full k-mer space of the yeast S. cerevisiae. This method, SVMotif-PWM, can significantly improve accuracy in computational identification of TF targets. The software is publicly available at http://cagt10.bu.edu/SVMotif .
  • Keywords
    DNA; biology computing; genetics; learning (artificial intelligence); matrix algebra; pattern classification; probability; biological DNA binding analysis; ensemble machine learning method; motif prediction; probability weight matrix; target classification; transcription factor target gene; Bioinformatics; DNA; Genomics; Learning systems; Machine learning; Mathematics; Pulse width modulation; Sequences; Statistics; Systems biology; DNA; bioinformatics; ensembles; machine learning; transcription factor;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2008. ICMLA '08. Seventh International Conference on
  • Conference_Location
    San Diego, CA
  • Print_ISBN
    978-0-7695-3495-4
  • Type

    conf

  • DOI
    10.1109/ICMLA.2008.114
  • Filename
    4725053