DocumentCode :
1987118
Title :
SMASHing regulatory sites in DNA by human-mouse sequence comparisons
Author :
Zavolan, Mihaela ; Rajewsky, Nikolaus ; Socci, Nicholas D. ; Gaasterlamd, Terry
Author_Institution :
Lab. for Comput. Genomics, Rockefeller Univ., New York, NY, USA
fYear :
2003
fDate :
11-14 Aug. 2003
Firstpage :
277
Lastpage :
286
Abstract :
Regulatory sequence elements provide important clues to understanding and predicting gene expression. Although the binding sites for hundreds of transcription factors are known, there has been no systematic attempt to incorporate this information in the annotation of the human genome. Cross species sequence comparisons are critical to a meaningful annotation of regulatory elements since they generally reside in conserved noncoding regions. To take advantage of the recently completed drafts of the mouse and human genomes for annotating transcription factor binding sites, we developed SMASH, a computational pipeline that identifies thousands of orthologous human/mouse proteins, maps them to genomic sequences, extracts and compares upstream regions and annotates putative regulatory elements in conserved, noncoding, upstream regions. Our current dataset consists of approximately 2500 human/mouse gene pairs. Transcription start sites were estimated by mapping quasifull length cDNA sequences. SMASH uses a novel probabilistic method to identify putative conserved binding sites that takes into account the competition between transcription factors for binding DNA. SMASH presents the results via a genome browser web interface which displays the predicted regulatory information together with the current annotations for the human genome. Our results are validated by comparison to previously published experimental data. SMASH results compare favorably to other existing computational approaches.
Keywords :
DNA; biology computing; genetics; molecular biophysics; pattern recognition; probability; proteins; SMASHing regulatory sites; annotating transcription factor binding sites; computational approaches; computational pipeline; conserved noncoding regions; cross species sequence comparison; gene expression prediction; genome browser web interface; human gene pair; human genome; human-mouse sequence comparison; mouse gene pairs; mouse proteins; orthologous human proteins; probabilistic method; putative regulatory elements; quasifull length cDNA sequences mapping; regulatory sequence elements; upstream regions; Bioinformatics; DNA; Displays; Gene expression; Genomics; Humans; Mice; Pipelines; Protein engineering; Sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
Print_ISBN :
0-7695-2000-6
Type :
conf
DOI :
10.1109/CSB.2003.1227328
Filename :
1227328
Link To Document :
بازگشت