DocumentCode :
1929452
Title :
Statistical learning for detecting protein-DNA-binding sites
Author :
Martinetz, Thomas ; Gewehr, Jan E. ; Kim, Jan T.
Author_Institution :
Inst. for Neuro- & Bioinformatics, Univ. of Lubeck, Germany
Volume :
4
fYear :
2003
fDate :
20-24 July 2003
Firstpage :
2940
Abstract :
Detecting the sites on genomic DNA at which DNA binding proteins bind is a highly relevant task in bioinformatics. For example, the binding sites of transcription factors are key elements of regulatory networks and determine the location of genes on a genome. Usually, for a given DNA binding protein, only a few DNA-subsequences at which the protein binds are known experimentally. The task then is to deduce the global binding characteristics of the protein based on these few positive examples. A widespread approach is the so-called profile-matrix (PM). The PM-approach can be interpreted as a linear classifier (binding word class/non-binding word class) within the space of sequence words, with the profile of the experimentally verified binding sites determining its parameters. In this paper a novel approach called binding-matrix (BM) is introduced. Like the PM, the BM realizes a linear classification, but in contrast to the profile-matrix approach the parameters (matrix) of the classifier is now determined by maximum likelihood estimation. Tested on data from the TRANSFAC database, the maximum likelihood estimation leads to an increase in classification performance by about an order of magnitude.
Keywords :
DNA; biology computing; maximum likelihood estimation; pattern classification; proteins; DNA binding protein; TRANSFAC database; binding-matrix approach; bioinformatics; genomic DNA; linear classification; maximum likelihood estimation; protein-DNA-binding sites; statistical learning; transcription factors; Bioinformatics; DNA; Databases; Genetics; Genomics; Maximum likelihood estimation; Proteins; Sequences; Statistical learning; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Neural Networks, 2003. Proceedings of the International Joint Conference on
ISSN :
1098-7576
Print_ISBN :
0-7803-7898-9
Type :
conf
DOI :
10.1109/IJCNN.2003.1224038
Filename :
1224038
Link To Document :
بازگشت