Title :
Motif discovery in upstream sequences of coordinately expressed genes
Author :
Stine, Matt ; Dasgupta, Dipankar ; Mukatira, Suraj
Author_Institution :
Div. of Comput. Sci., Memphis Univ., TN, USA
Abstract :
The paper presents a genetic mining approach to discover highly conserved motifs amongst upstream sequences of co-regulated genes. These motifs represent putative cis-regulatory elements that could play an important role in the co-ordinated expression of these genes. A structured genetic algorithm (St-GA) was used to evolve candidate motifs of variable length. Fitness values were assigned as a function of high scoring alignments performed with NCBI BLAST. The St-GA performed favorable with respect to existing methods on simple (l,k) insertion problems, but was unable to overcome the (l,4) insertion problem that has proved elusive to other methods. Deterministic crowding was added to the St-GA to help cope with the multimodal nature of real-world genomic data. The genetic search was performed on a set of genes selected based on their expression values as highly predictive of a subtype of pediatric ALL. Four high scoring motifs were obtained that successfully matched subsequences of cis-elements found in the TRANSFAC database. Results demonstrated that the St-GA approach to motif finding has the potential to be a competitive method for this type of problem.
Keywords :
biology; data mining; genetic algorithms; genetics; NCBI BLAST; St-GA; TRANSFAC database; cis-regulatory elements; coordinately expressed genes; coregulated genes; deterministic crowding; genetic mining; genetic search; high scoring alignments; motif discovery; pediatric ALL; real-world genomic data; simple insertion problems; structured genetic algorithm; upstream sequences; Bioinformatics; Computer science; Databases; Frequency estimation; Gene expression; Genetic algorithms; Genomics; Hospitals; Pediatrics; Sequences;
Conference_Titel :
Evolutionary Computation, 2003. CEC '03. The 2003 Congress on
Print_ISBN :
0-7803-7804-0
DOI :
10.1109/CEC.2003.1299863