• DocumentCode
    53351
  • Title

    An Algorithm for Motif Discovery with Iteration on Lengths of Motifs

  • Author

    Yetian Fan ; Wei Wu ; Jie Yang ; Wenyu Yang ; Rongrong Liu

  • Author_Institution
    Sch. of Math. Sci., Dalian Univ. of Technol., Dalian, China
  • Volume
    12
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan.-Feb. 1 2015
  • Firstpage
    136
  • Lastpage
    141
  • Abstract
    Analysis of DNA sequence motifs is becoming increasingly important in the study of gene regulation, and the identification of motif in DNA sequences is a complex problem in computational biology. Motif discovery has attracted the attention of more and more researchers, and varieties of algorithms have been proposed. Most existing motif discovery algorithms fix the motif´s length as one of the input parameters. In this paper, a novel method is proposed to identify the optimal length of the motif and the optimal motif with that length, through an iteration process on increasing length numbers. For each fixed length, a modified genetic algorithm (GA) is used for finding the optimal motif with that length. Three operators are used in the modified GA: Mutation that is similar to the one used in usual GA but is modified to avoid local optimum in our case, and Addition and Deletion that are proposed by us for the problem. A criterion is given for singling out the optimal length in the increasing motif´s lengths. We call this method AMDILM (an algorithm for motif discovery with iteration on lengths of motifs). The experiments on simulated data and real biological data show that AMDILM can accurately identify the optimal motif length. Meanwhile, the optimal motifs discovered by AMDILM are consistent with the real ones and are similar with the motifs obtained by the three well-known methods: Gibbs Sampler, MEME and Weeder.
  • Keywords
    DNA; bioinformatics; genetics; iterative methods; molecular biophysics; molecular configurations; AMDILM; DNA sequence motifs analysis; Gibbs Sampler; MEME; Weeder; algorithm-for-motif discovery-with-iteration-on-lengths-of-motifs; biological data; complex problem; computational biology; gene regulation; modified genetic algorithm; motif discovery algorithm; motifs length iteration; mutation; optimal motif length; simulated data; Bioinformatics; Computational biology; DNA; Educational institutions; Genetic algorithms; IEEE transactions; DNA sequences; Motif discovery; motif???s length;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2351793
  • Filename
    6891181