Title :
Data Analytics for Protein-DNA Binding Interactions
Author_Institution :
Dept. of Comput. Sci., City Univ. of Hong Kong, Kowloon Tong, China
Abstract :
Determining the protein-DNA binding specificity is an important step in understanding genetic codes. With a large amount of protein-DNA complexes, mature statistical and data mining techniques, and efficient computational power, a fundamental and comprehensive protein-DNA binding sequence analysis is conducted and described in this work. In particular, two different types of analysis are proposed and described. Firstly, statistical analysis is conducted to give holistic insights into the protein-DNA binding sequences. Secondly, data mining techniques are applied to extract interesting sequence patterns which takes into account both sides (protein and DNA sides). The results demonstrate that there are statistically enriched sequence patterns among the protein-DNA binding sequences. Nonetheless, it also confirms that there is not any general principle in protein-DNA binding in a big data analytics manner. To address that, contemporary data mining methods are introduced to discover advanced sequence patterns. The patterns are validated with an external database, revealing biological insights into protein-DNA binding interactions.
Keywords :
"Proteins","DNA","Data mining","Amino acids","Statistical analysis","Color"
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
DOI :
10.1109/SMC.2015.278