Title :
Discovering protein-DNA binding cores by aligned pattern clustering
Author :
Lee, En-Shiun Annie ; Ho-Yin Sze-To ; Man-Hon Wong ; Kwong-Sak Leung ; Lau, Terrence Chi-Kong ; Wong, Andrew K. C.
Author_Institution :
Syst. Design Eng., Univ. of Waterloo, Waterloo, ON, Canada
Abstract :
Understanding binding cores is of fundamental importance in deciphering Protein-DNA (TF-TFBS) binding and gene regulation. Variations (or mutations) in binding cores are ubiquitous and have different levels of effects on the binding specificity. To alleviate expensive experiments, we have developed a new method to discover directly from sequence data binding cores and study the effect due to variations. Although existing computational methods have produced satisfactory TF-TFBS binding cores, they are only one-to-one mappings with no site-specific information on residue/nucleotide variations; and also are largely overlapped. In this study, we propose a new representation for modeling TF-TFBS binding with variants known as TF-TFBS Co-Supportive Aligned Pattern Clusters (APCs), which are more compact, with more details for site-specific variants, and biologically more intuitive for analysis. To achieve this task, we have also developed an algorithm to discover TF-TFBS Co-Supportive APCs to capture binding cores at a higher precision with much faster runtime (≥1600X) comparing to other methods. The variants in TF-TFBS Co-Supportive APCs are also statistically analyzed and demonstrated that they can assist homology modeling to synthesize new biological knowledge.
Keywords :
DNA; bioinformatics; data mining; genetics; genomics; molecular clusters; pattern clustering; proteins; statistical analysis; TF-TFBS binding cores; TFTFBS cosupportive aligned pattern clusters; aligned pattern clustering; binding specificity; biological knowledge; deciphering protein-DNA binding; gene regulation; homology modeling; mutations; one-to-one mappings; protein-DNA binding cores; residue-nucleotide variations; sequence data binding cores; site-specific variants; statistical analysis; Amino acids; DNA; Educational institutions; Pattern matching; Proteins; Three-dimensional displays; Aligned Pattern Cluster; Association Rule Mining; Binding Cores; Protein-DNA Binding;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2014 IEEE International Conference on
Conference_Location :
Belfast
DOI :
10.1109/BIBM.2014.6999140