مرکز منطقه ای اطلاع رساني علوم و فناوري - Design and Optimization of a Metagenomics Analysis Workflow for NVRAM

DocumentCode :

167348

Title :

Design and Optimization of a Metagenomics Analysis Workflow for NVRAM

Author :

Ames, Sasha ; Allen, Jonathan E. ; Hysom, David A. ; Lloyd, G. Scott ; Gokhale, Maya B.

Author_Institution :

Lawrence Livermore Nat. Lab., Livermore, CA, USA

fYear :

2014

fDate :

19-23 May 2014

Firstpage :

556

Lastpage :

565

Abstract :

Metagenomic analysis, the study of microbial communities found in environmental samples, presents considerable challenges in quantity of data and computational cost. We present a novel metagenomic analysis pipeline that leverages emerging large address space compute nodes with NVRAM to hold a searchable, memory-mapped "k-mer" database of all known genomes and their taxonomic lineage. We describe challenges to creating the many hundred gigabytes-sized databases and describe database organization optimizations that enable our Livermore Metagenomic Analysis Toolkit (LMAT) software to effectively query the k-mer key-value store, which resides in high performance flash storage, as if fully in memory. To make database creation tractable, we have designed, implemented, and evaluated an optimized ingest pipeline. To optimize query performance for the database, we present a twolevel index scheme that yields speedups of 8.4× -74× over a conventional hash table index. LMAT, including the ingest pipeline, is available as open source at SourceForge.

Keywords :

bioinformatics; optimisation; query processing; random-access storage; NVRAM; k-mer key-value store; large address space compute nodes; livermore metagenomic analysis toolkitsoftware; memory-mapped k-mer database; microbial communities; query optimisation; Arrays; Genomics; Indexes; Runtime; Taxonomy;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International

Conference_Location :

Phoenix, AZ

Print_ISBN :

978-1-4799-4117-9

Type :

conf

DOI :

10.1109/IPDPSW.2014.200

Filename :

6969435

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=167348