Title :
Cloud statistical significance estimation for optimal local alignment of huge DNA sequences
Author :
Hosny, Ahmad M. ; Shedeed, Howida A. ; Hussein, Ashraf S. ; Tolba, Mohamed F.
Author_Institution :
Dept. of Sci. Comput., Ain Shams Univ., Cairo, Egypt
Abstract :
Confidence in a pairwise local sequence alignment is a fundamental problem in bioinformatics. For huge DNA sequences, this problem is highly compute-intensive because it involves evaluating thousands of local alignments to construct an empirical score distribution. Recent parallel solutions support only small sequence sizes and/or are based on sophisticated infrastructures that are not available for most research labs. This paper presents an efficient parallel solution for evaluating the statistical significance for a pair of huge DNA sequences using cloud infrastructures. This solution can receive requests from various researchers via web-portal and allocate resources according to the demand. As it is cloud-based solution, it improves robustness, scalability and performance. The fundamental innovation in this research work is proposing an efficient solution that utilizes both shared and distributed memory architectures using the cloud technology to enhance the performance of evaluating the statistical significance for pair of DNA sequences. In this manner, the condition of the sequence size is released to be in megabyte-scale, which was not supported before. The present solution was verified against other recent parallel solutions, and the performance evaluation was carried out on Microsoft´s Cloud. The results show that the performance scales with relatively linear speedup, as the number of instances increases.
Keywords :
DNA; bioinformatics; cloud computing; portals; statistical analysis; DNA sequences; Microsoft cloud; Web-portal; bioinformatics; cloud infrastructures; cloud statistical significance estimation; cloud-based solution; empirical score distribution; megabyte-scale; optimal local alignment; pairwise local sequence alignment; parallel solutions; Bioinformatics; Cloud computing; Computer architecture; Computers; DNA; Estimation; Parallel processing; Cloud Computing; megabase DNA sequence; multi-core architectures; sequence alignment; statistical significance estimation;
Conference_Titel :
Informatics and Systems (INFOS), 2012 8th International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4673-0828-1