Title :
Terminator Detection by Support Vector Machine Utilizing a Stochastic Context-Free Grammar
Author :
Francis-Lyon, P. ; Cristianini, Nello ; Holbrook, Stephen
Author_Institution :
California Univ., Davis, CA
Abstract :
A 2-stage detector was designed to find rho-independent transcription terminators in the Escherichia coli genome. The detector includes a stochastic context free grammar (SCFG) component and a support vector machine (SVM) component. To find terminators, the SCFG searches the intergenic regions of nucleotide sequence for local matches to a terminator grammar that was designed and trained utilizing examples of known terminators. The grammar selects sequences that are the best candidates for terminators and assigns them a prefix, stem-loop, suffix structure using the Cocke-Younger-Kasaami (CYK) algorithm, modified to incorporate energy effects of base pairing. The parameters from this inferred structure are passed to the SVM classifier, which distinguishes terminators from non-terminators that score high according to the terminator grammar. The SVM was trained with negative examples drawn from intergenic sequences that include both featureless and RNA gene regions (which were assigned prefix, stem-loop, suffix structure by the SCFG), so that it successfully distinguishes terminators from either of these. The classifier was found to be 96.4% successful during testing
Keywords :
biology computing; context-free grammars; microorganisms; support vector machines; 2-stage detector; Cocke-Younger-Kasaami algorithm; Escherichia coli genome; rho-independent transcription terminator; stochastic context-free grammar; support vector machine; terminator detection; Bioinformatics; Computational intelligence; DNA; Detectors; Proteins; RNA; Sequences; Stochastic processes; Support vector machine classification; Support vector machines;
Conference_Titel :
Computational Intelligence and Bioinformatics and Computational Biology, 2007. CIBCB '07. IEEE Symposium on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0710-9
DOI :
10.1109/CIBCB.2007.4221220