• DocumentCode
    66040
  • Title

    Modeling Associated Protein-DNA Pattern Discovery with Unified Scores

  • Author

    Tak-Ming Chan ; Leung-Yau Lo ; Ho-Yin Sze-To ; Kwong-Sak Leung ; Xinshu Xiao ; Man-Hon Wong

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
  • Volume
    10
  • Issue
    3
  • fYear
    2013
  • fDate
    May-June 2013
  • Firstpage
    696
  • Lastpage
    707
  • Abstract
    Understanding protein-DNA interactions, specifically transcription factor (TF) and transcription factor binding site (TFBS) bindings, is crucial in deciphering gene regulation. The recent associated TF-TFBS pattern discovery combines one-sided motif discovery on both the TF and the TFBS sides. Using sequences only, it identifies the short protein-DNA binding cores available only in high-resolution 3D structures. The discovered patterns lead to promising subtype and disease analysis applications. While the related studies use either association rule mining or existing TFBS annotations, none has proposed any formal unified (both-sided) model to prioritize the top verifiable associated patterns. We propose the unified scores and develop an effective pipeline for associated TF-TFBS pattern discovery. Our stringent instance-level evaluations show that the patterns with the top unified scores match with the binding cores in 3D structures considerably better than the previous works, where up to 90 percent of the top 20 scored patterns are verified. We also introduce extended verification from literature surveys, where the high unified scores correspond to even higher verification percentage. The top scored patterns are confirmed to match the known WRKY binding cores with no available 3D structures and agree well with the top binding affinities of in vivo experiments.
  • Keywords
    DNA; bioinformatics; bonds (chemical); data mining; genetics; molecular biophysics; molecular configurations; proteins; 3D structure binding core; WRKY binding core; associated TF-TFBS pattern discovery; associated protein-DNA pattern discovery modeling; association rule mining; both-sided model; disease analysis application; existing TFBS annotation; formal unified model; gene regulation; high resolution 3D structure; high unified score; high verification percentage; in vivo experiment; instance-level evaluation; literature survey extended verification; one-sided motif discovery; protein-DNA interaction; scored pattern verification; sequence usage; short protein-DNA binding core identification; subtype analysis application; top binding affinity; top scored pattern; top unified score pattern; transcription factor binding site; Association rules; DNA; Diseases; Pattern matching; Proteins; Three-dimensional displays; 3D structure binding core; Association rules; Bioinformatics; DNA; Diseases; Pattern matching; Proteins; TF-TFBS associated pattern discovery; Three-dimensional displays; WRKY binding core; associated TF-TFBS pattern discovery; associated protein-DNA pattern discovery modeling; association rule mining; binding rules; bioinformatics; bonds (chemical); both-sided model; data mining; disease analysis application; existing TFBS annotation; formal unified model; gene regulation; genetics; high resolution 3D structure; high unified score; high verification percentage; in vivo experiment; instance-level evaluation; literature survey extended verification; molecular biophysics; molecular configurations; motif discovery; one-sided motif discovery; protein-DNA interaction; protein-DNA interactions; proteins; scored pattern verification; sequence usage; short protein-DNA binding core identification; subtype analysis application; top binding affinity; top scored pattern; top unified score pattern; transcription factor binding site;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.60
  • Filename
    6517185