DocumentCode :
167313
Title :
Side effect machine features for analysis and comparison of DNA promoter sequences
Author :
Ashlock, Wendy
Author_Institution :
Biol. Dept., York Univ., Toronto, ON, Canada
fYear :
2014
fDate :
21-24 May 2014
Firstpage :
1
Lastpage :
8
Abstract :
Understanding genomes involves more than just knowing what all the genes are. It also involves knowing how the genes are regulated. An important piece of this concerns gene promoters. These are regions upstream of genes to which groups of protein (transcription factors) bind in order to initiate transcription of the gene. Much work has focused on finding sequence properties of the DNA binding sites, called binding motifs. The presence of a binding motif in a promoter does not, however, mean that the associated transcription factor is likely to bind to that promoter, only that it is possible. There are other, less understood, features of promoter sequences that determine which of all the transcription factors with binding motifs in them will bind to them and in what combinations. Next generation sequencing technologies make it possible to experimentally determine which transcription factors bind to which promoters. This technology requires money and the time of skilled experimental biologists. Experiments have only been performed for a few select model organisms. Our goal is to leverage the experimental results on those organisms to provide information about promoters in other organisms. Towards that end, this work uses side effect machines to find sequence features of promoters. Side effect machines are augmented finite state machines that compute DNA sequence features. These features have been used with classifiers with good results in many diverse DNA sequence classification problems. Useful side effect machines for a given problem are found using an evolutionary algorithm. We present a novel fitness function for side effect machines that produce features that can be used to measure the similarity between promoter sequences. We create a distance measure between promoter sequences that is significantly correlated with a distance measure based on the experiments done on the yeast genome.
Keywords :
DNA; bioinformatics; finite state machines; genomics; molecular biophysics; molecular configurations; DNA binding site sequence properties; DNA promoter sequence analysis; DNA promoter sequence comparison; DNA sequence features; augmented finite state machines; binding motif sequence properties; distance measure; fitness function; gene promoters; gene regulation; gene transcription; genomes; next generation sequencing technologies; promoter sequence features; side effect machine features; transcription factors; yeast genome; Accuracy; Correlation; DNA; Evolutionary computation; Genomics; Organisms; Radiation detectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2014 IEEE Conference on
Conference_Location :
Honolulu, HI
Type :
conf
DOI :
10.1109/CIBCB.2014.6845526
Filename :
6845526
Link To Document :
بازگشت