• DocumentCode
    1744
  • Title

    Discovering Binding Cores in Protein-DNA Binding Using Association Rule Mining with Statistical Measures

  • Author

    Man-Hon Wong ; Ho-Yin Sze-To ; Leung-Yau Lo ; Tak-Ming Chan ; Kwong-Sak Leung

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Hong Kong, China
  • Volume
    12
  • Issue
    1
  • fYear
    2015
  • fDate
    Jan.-Feb. 1 2015
  • Firstpage
    142
  • Lastpage
    154
  • Abstract
    Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and for the deep understanding of gene regulation. Traditionally, binding cores are identified in resolved high-resolution 3D structures. However, it is expensive, labor-intensive and time-consuming to obtain these structures. Hence, it is promising to discover binding cores computationally on a large scale. Previous studies successfully applied association rule mining to discover binding cores from TF-TFBS binding sequence data only. Despite the successful results, there are limitations such as the use of tight support and confidence thresholds, the distortion by statistical bias in counting pattern occurrences, and the lack of a unified scheme to rank TF-TFBS associated patterns. In this study, we proposed an association rule mining algorithm incorporating statistical measures and ranking to address these limitations. Experimental results demonstrated that, even when the threshold on support was lowered to one-tenth of the value used in previous studies, a satisfactory verification ratio was consistently observed under different confidence levels. Moreover, we proposed a novel ranking scheme for TF-TFBS associated patterns based on p-values and co-support values. By comparing with other discovery approaches, the effectiveness of our algorithm was demonstrated. Eighty-four binding cores with PDB support are uniquely identified.
  • Keywords
    DNA; bioinformatics; data mining; genetics; molecular biophysics; molecular configurations; proteins; statistical analysis; TF-TFBS binding sequence data; association rule mining; association rule mining algorithm; binding cores; confidence thresholds; cosupport values; counting pattern occurrences; gene regulation; p-values; protein-DNA binding; resolved high-resolution 3D structures; satisfactory verification ratio; statistical bias; statistical measures; DNA; Data mining; Databases; IEEE transactions; Proteins; Three-dimensional displays; Protein-DNA binding; association rule mining; binding cores; statistical measures;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2343952
  • Filename
    6867331