Title of article :
The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering
Author/Authors :
Khalid Jaber، نويسنده , , NurʹAini Abdul Rashid، نويسنده , , Rosni Abdullah، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2009
Abstract :
Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI) library and POSIX threads (PThread) to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.
Keywords :
Maximal cliques algorithm , Parallel processing , protein sequence clustering , MPI , pthread
Journal title :
American Journal of Applied Sciences
Journal title :
American Journal of Applied Sciences