Title :
A Genetic-Based EM Motif-Finding Algorithm for Biological Sequence Analysis
Author_Institution :
Schools of Medicine, Comput. & Eng., Missouri Univ., Kansas City, MO
Abstract :
Motif-finding in biological sequence analysis remains a challenge in computational biology. Many algorithms and software packages have been developed to address the problem. The expectation maximization (EM)-type motif algorithm such as MEME is one of the most popular de novo motif discovery methods. However, as pointed out in literature, EM algorithms largely depend on their initialization and can be easily trapped in local optima. This paper proposes and implements a genetic-based EM motif-finding algorithm (GEMFA) aiming to overcome the drawbacks inherent in EM motif discovery algorithms. It first initializes a population of multiple local alignments each of which is encoded on a chromosome that represents a potential solution. GEMFA then performs heuristic search in the whole alignment space using minimum distance length (MDL) as the fitness function which is generalized from maximum log-likelihood. The genetic algorithm gradually moves this population towards the best alignment from which the motif model is derived. Simulated and real biological sequence analysis showed that GEMFA performed better than the simple multiple-restart of EM motif-finding algorithm especially in the subtle motif sequence alignment and other similar algorithms as well
Keywords :
biology computing; data mining; expectation-maximisation algorithm; genetic algorithms; EM motif discovery; biological sequence analysis; computational biology; expectation maximization-type motif algorithm; fitness function; genetic algorithm; genetic-based EM motif-finding algorithm; maximum log-likelihood; minimum distance length; Algorithm design and analysis; Analytical models; Biological cells; Biological information theory; Biological system modeling; Computational biology; Genetic algorithms; Sequences; Software algorithms; Software packages;
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0710-9
DOI :
10.1109/CIBCB.2007.4221233