DocumentCode :
3251606
Title :
Mining genes in DNA using GeneScout
Author :
Yin, Michael M. ; Wang, Jason T L
Author_Institution :
Dept. of Comput. Sci., New Jersey Inst. of Technol., Newark, NJ, USA
fYear :
2002
fDate :
2002
Firstpage :
733
Lastpage :
736
Abstract :
In this paper we present a new system, called GeneScout, for predicting gene structures in vertebrate genomic DNA. The system contains specially designed hidden Markov models (HMMs) for detecting functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites, etc. Our main hypothesis is that, given a vertebrate genomic DNA sequence S, it is always possible to construct a directed acyclic graph G such that the path for the actual coding region of S is in the set of all paths on G. Thus, the gene detection problem is reduced to that of analyzing the paths in the graph G. A dynamic programming algorithm is used to find the optimal path in G. The proposed system is trained using an expectation-maximization (EM) algorithm and its performance on vertebrate gene prediction is evaluated using the 10-way cross-validation method. Experimental results show the good performance of the proposed system and its complementarity to a widely used gene detection system.
Keywords :
DNA; data mining; dynamic programming; hidden Markov models; medical computing; GeneScout; acceptor sites; expectation-maximization algorithm; gene detection system; gene structures; genes mining; hidden Markov models; mRNA splicing junction donor; protein-translation start sites; vertebrate gene prediction; vertebrate genomic DNA; Algorithm design and analysis; Bioinformatics; DNA; Dynamic programming; Genomics; Hidden Markov models; Performance analysis; Proteins; Sequences; Splicing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on
Print_ISBN :
0-7695-1754-4
Type :
conf
DOI :
10.1109/ICDM.2002.1184041
Filename :
1184041
Link To Document :
بازگشت