Title :
Comparison of genomes using high-performance parallel computing
Author :
Almeida, N.F., Jr. ; Caceres, E.N. ; Alves, C.E.R. ; Song, S.W.
Author_Institution :
Univ. Fed. de Mato Grosso do Sul, Brazil
Abstract :
Comparison of the DNA sequences and genes of two genomes can be useful to investigate the common functionalities of the corresponding organisms and get a better understanding of how the genes or groups of genes are organized and involved in several functions. We use high-performance parallel computing to compare the whole genomes of two organisms, namely Xanthomonas axonopodis pv. citri and Xanthomonas campestris pv. campestris, each with more than five million base pairs. Our purpose is two-fold. First we intend to exploit the high-performance power of a cluster of low-cost microcomputers, propose a parallel solution to this problem, and show its feasibility with implementation and performance results. Second we do additional comparisons of the two genomes by locating and compare not only the homologous genes (expressed in terms of the 20-letter amino acids) but also compare the regions or gaps (in terms of the 4-letter DNA nucleotides) between the corresponding homologous genes. We have implemented the proposed comparison strategy to compare the two genomes Xanthomonas axonopodis pv. citri (Xac) and Xanthomonas campestris pv. campestris (Xcc). The parallel platform used is a Beowulf cluster of 64 nodes consisting of low cost microcomputers. Xac has 5175554 base pairs and 4313 protein-coding genes while Xcc has 5076187 base pairs and 4182 protein-coding genes. The parallel solution is based on the dynamic programming approach and presents not only less processing time, but also better quality results as compared to approaches based on Blast and EGG.
Keywords :
DNA; biology computing; computational complexity; dynamic programming; genetics; microcomputers; parallel algorithms; DNA nucleotide; DNA sequence; Xanthomonas genome organism; amino acid; dynamic programming; genome comparison; homologous gene; homologous genes comparison; microcomputer; parallel computing; Amino acids; Bioinformatics; Costs; DNA; Genomics; Microcomputers; Organisms; Parallel processing; Proteins; Sequences;
Conference_Titel :
Computer Architecture and High Performance Computing, 2003. Proceedings. 15th Symposium on
Print_ISBN :
0-7695-2046-4
DOI :
10.1109/CAHPC.2003.1250332