Title :
Fast Parallel Markov Clustering in Bioinformatics Using Massively Parallel Computing on GPU with CUDA and ELLPACK-R Sparse Format
Author :
Bustamam, Alhadi ; Burrage, Kevin ; Hamilton, Nicholas A.
Author_Institution :
Dept. of Math., Univ. of Indonesia, Depok, Indonesia
Abstract :
Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However, with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.
Keywords :
Markov processes; bioinformatics; graphics processing units; large-scale systems; parallel architectures; CUDA tool; ELLPACK-R sparse format; GPU computing; bioinformatics; biological networks; critical limiting factor; fast Markov clustering algorithm; fast parallel Markov clustering; fine-grain massively parallel processing; interaction networks data sets; large-scale parallel computation; massively parallel computing environment; off-the-shelf desktop-machines; on-chip memory; parallel sparse Markov matrix normalizations; parallel sparse matrix-matrix computations; supercomputing architectures; Bioinformatics; Graphics processing unit; Instruction sets; Markov processes; Multicore processing; Parallel processing; Proteins; CUDA; ELLPACK-R sparse format; GPU computing; Markov clustering; PPI networks; bioinformatics.; graphs and networks; parallelism and concurrency; performance evaluation; scalable parallel programming; Cluster Analysis; Computational Biology; Computer Graphics; Computer Simulation; Markov Chains; Oligonucleotide Array Sequence Analysis;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2011.68