DocumentCode :
2933886
Title :
Optimizing Phylogenetic Analysis Using SciHmm Cloud-based Scientific Workflow
Author :
Ocaña, Kary A C S ; De Oliveira, Daniel ; Dias, Jonas ; Ogasawara, Eduardo ; Mattoso, Marta
Author_Institution :
COPPE, Fed. Univ. of Rio de Janeiro, Rio de Janeiro, Brazil
fYear :
2011
fDate :
5-8 Dec. 2011
Firstpage :
62
Lastpage :
69
Abstract :
Phylogenetic analysis and multiple sequence alignment (MSA) are closely related bioinformatics fields. Phylogenetic analysis makes extensive use of MSA in the construction of phylogenetic trees, which are used to infer the evolutionary relationships between homologous genes. These bioinformatics experiments are usually modeled as scientific workflows. There are many alternative workflows that use different MSA methods to conduct phylogenetic analysis and each one can produce MSA with different quality. Scientists have to explore which MSA method is the most suitable for their experiments. However, workflows for phylogenetic analysis are both computational and data intensive and they may run sequentially during weeks. Although there any many approaches that parallelize these workflows, exploring all MSA methods many become a burden and expensive task. If scientists know the most adequate MSA method a priori, it would spare time and money. To optimize the phylogenetic analysis workflow, we propose in this paper SciHmm, a bioinformatics scientific workflow based in profile hidden Markov models (pHMMs) that aims at determining the most suitable MSA method for a phylogenetic analysis prior than executing the phylogenetic workflow. SciHmm is also executed in parallel in a cloud environment using SciCumulus middleware. The results demonstrated that optimizing a phylogenetic analysis using SciHmm considerably reduce the total execution time of phylogenetic analysis (up to 80%). This optimization also demonstrates that the biological results presented more quality. In addition, the parallel execution of SciHmm demonstrates that this kind of bioinformatics workflow is suitable to be executed in the cloud.
Keywords :
bioinformatics; cloud computing; evolution (biological); genetics; hidden Markov models; middleware; scientific information systems; workflow management software; MSA method; SciCumulus middleware; SciHmm cloud-based scientific workflow; alternative workflows; bioinformatics field; bioinformatics scientific workflow; cloud environment; evolutionary relationship; hidden Markov model; homologous genes; multiple sequence alignment; phylogenetic analysis; phylogenetic trees; scientific workflows; Bioinformatics; Databases; Genomics; Hidden Markov models; Muscles; Phylogeny; Proteins; cloud computing; profile hidden Markov models; scientific workflows;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
E-Science (e-Science), 2011 IEEE 7th International Conference on
Conference_Location :
Stockholm
Print_ISBN :
978-1-4577-2163-2
Type :
conf
DOI :
10.1109/eScience.2011.17
Filename :
6123260
Link To Document :
بازگشت