DocumentCode :
1783227
Title :
Enabling In-Situ Data Analysis for Large Protein-Folding Trajectory Datasets
Author :
Boyu Zhang ; Estrada, Trilce ; Cicotti, Pietro ; Taufer, Michela
fYear :
2014
fDate :
19-23 May 2014
Firstpage :
221
Lastpage :
230
Abstract :
This paper presents a one-pass, distributed method that enables in-situ data analysis for large protein folding trajectory datasets by executing sufficiently fast, avoiding moving trajectory data, and limiting the memory usage. First, the method extracts the geometric shape features of each protein conformation in parallel. Then, it classifies sets of consecutive conformations into meta-stable and transition stages using a probabilistic hierarchical clustering method. Lastly, it rebuilds the global knowledge necessary for the intraand inter-trajectory analysis through a reduction operation. The comparison of our method with a traditional approach for a villin headpiece sub domain shows that our method generates significant improvements in execution time, memory usage, and data movement. Specifically, to analyze the same trajectory consisting of 20,000 protein conformations, our method runs in 41.5 seconds while the traditional approach takes approximately 3 hours, uses 6.9MB memory per core while the traditional method uses 16GB on one single node where the analysis is performed, and communicates only 4.4KB while the traditional method moves the entire dataset of 539MB. The overall results in this paper support our claim that our method is suitable for in-situ data analysis of folding trajectories.
Keywords :
bioinformatics; data analysis; distributed processing; pattern clustering; proteins; distributed method; geometric shape features; global knowledge necessary; in-situ data analysis; intertrajectory analysis; intratrajectory analysis; large protein-folding trajectory datasets; memory usage; probabilistic hierarchical clustering method; protein conformation; villin headpiece subdomain; Correlation; Crystals; Data analysis; Data mining; Feature extraction; Proteins; Trajectory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
ISSN :
1530-2075
Print_ISBN :
978-1-4799-3799-8
Type :
conf
DOI :
10.1109/IPDPS.2014.33
Filename :
6877257
Link To Document :
بازگشت