DocumentCode
2989588
Title
Parallelization of the functional flow algorithm for prediction of protein function using protein-protein interaction networks
Author
Akkoyun, Emrah ; Can, Tolga
Author_Institution
Dept. of Med. Inf., Middle East Tech. Univ., Ankara, Turkey
fYear
2011
fDate
4-8 July 2011
Firstpage
56
Lastpage
62
Abstract
Protein-protein interaction networks provide important information about functions of proteins. There are various studies which analyze interaction networks and predict functions of novel proteins based on their network connectivity. However, all of these methods are sequential methods that do not utilize high performance computing. Functional flow is one of these methods that uses network connectivity, distance effect, and topology of the network with local and global views to predict protein function. With these advantages, the functional flow algorithm produces more accurate results compared to other techniques. However, due to lack of a parallelized version of the algorithm, the method cannot be practically applied on large scale networks of complex species. In this paper, we provide a parallel implementation of functional flow. We use Hadoop which is one of the open source map/reduce environments. For our experiments, we installed Hadoop on 18 hosts with eight cores each. The first map/reduce job distributes the protein interaction network as a format which allows parallel distributed computing on all the worker nodes. The other map/reduce jobs generate flows for each known protein function and the function of novel proteins are predicted by accumulating all of these generated flows. Our experiments show that the method can be distributed on worker nodes efficiently and the application can provide better performance as the number of resources increases.
Keywords
bioinformatics; parallel processing; proteins; public domain software; Hadoop; MapReduce; distance effect; functional flow algorithm; high performance computing; network connectivity; network topology; open source map environment; parallel distributed computing; parallel implementation; protein function prediction; protein-protein interaction network; sequential method; Bioinformatics; Computers; Distributed computing; Prediction algorithms; Proteins; Reservoirs; Bioinformatics and Biocomputing; Hadoop; MapReduce; Network Flow; Parallel and Distributed Computing; Protein-Protein Interactions;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing and Simulation (HPCS), 2011 International Conference on
Conference_Location
Istanbul
Print_ISBN
978-1-61284-380-3
Type
conf
DOI
10.1109/HPCSim.2011.5999807
Filename
5999807
Link To Document