• DocumentCode
    848552
  • Title

    SCS: Signal, Context, and Structure Features for Genome-Wide Human Promoter Recognition

  • Author

    Zeng, Jia ; Zhao, Xiao-Yu ; Cao, Xiao-Qin ; Yan, Hong

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Soochow Univ., Suzhou, China
  • Volume
    7
  • Issue
    3
  • fYear
    2010
  • Firstpage
    550
  • Lastpage
    562
  • Abstract
    This paper integrates the signal, context, and structure features for genome-wide human promoter recognition, which is important in improving genome annotation and analyzing transcriptional regulation without experimental supports of ESTs, cDNAs, or mRNAs. First, CpG islands are salient biological signals associated with approximately 50 percent of mammalian promoters. Second, the genomic context of promoters may have biological significance, which is based on n-mers (sequences of n bases long) and their statistics estimated from training samples. Third, sequence-dependent DNA flexibility originates from DNA 3D structures and plays an important role in guiding transcription factors to the target site in promoters. Employing decision trees, we combine above signal, context, and structure features to build a hierarchical promoter recognition system called SCS. Experimental results on controlled data sets and the entire human genome demonstrate that SCS is significantly superior in terms of sensitivity and specificity as compared to other state-of-the-art methods. The SCS promoter recognition system is available online as supplemental materials for academic use and can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2008.95.
  • Keywords
    biology computing; genomics; molecular biophysics; molecular configurations; DNA 3D structures; genome annotation; genome-wide human promoter recognition; genomic context; hierarchical promoter recognition system; salient biological signals; sequence-dependent DNA flexibility; signal, context; transcriptional regulation; Bioinformatics; Computer Society; DNA; Decision trees; Genomics; Humans; Sensitivity and specificity; Sequences; Signal analysis; Statistics; Biology and genetics; Pattern Recognition; Promoter recognition; classifier combination; feature extraction; genome analysis.; Algorithms; CpG Islands; Gene Expression Regulation; Genome, Human; Genomics; Humans; Promoter Regions, Genetic; Sequence Analysis, DNA; Societies, Scientific;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2008.95
  • Filename
    4609377