DocumentCode :
3728173
Title :
Data Analytics for Protein-DNA Binding Interactions
Author :
Ka-Chun Wong
Author_Institution :
Dept. of Comput. Sci., City Univ. of Hong Kong, Kowloon Tong, China
fYear :
2015
Firstpage :
1573
Lastpage :
1578
Abstract :
Determining the protein-DNA binding specificity is an important step in understanding genetic codes. With a large amount of protein-DNA complexes, mature statistical and data mining techniques, and efficient computational power, a fundamental and comprehensive protein-DNA binding sequence analysis is conducted and described in this work. In particular, two different types of analysis are proposed and described. Firstly, statistical analysis is conducted to give holistic insights into the protein-DNA binding sequences. Secondly, data mining techniques are applied to extract interesting sequence patterns which takes into account both sides (protein and DNA sides). The results demonstrate that there are statistically enriched sequence patterns among the protein-DNA binding sequences. Nonetheless, it also confirms that there is not any general principle in protein-DNA binding in a big data analytics manner. To address that, contemporary data mining methods are introduced to discover advanced sequence patterns. The patterns are validated with an external database, revealing biological insights into protein-DNA binding interactions.
Keywords :
"Proteins","DNA","Data mining","Amino acids","Statistical analysis","Color"
Publisher :
ieee
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/SMC.2015.278
Filename :
7379410
Link To Document :
بازگشت