Title :
An automated infrastructure to support high-throughput bioinformatics
Author :
Cuccuru, Gianmauro ; Leo, Simone ; Lianas, Luca ; Muggiri, Michele ; Pinna, Andrea ; Pireddu, Luca ; Uva, Paolo ; Angius, Alessio ; Fotia, Giorgio ; Zanetti, Gianluigi
Author_Institution :
CRS4, Pula, Italy
Abstract :
The number of domains affected by the big data phenomenon is constantly increasing, both in science and industry, with high-throughput DNA sequencers being among the most massive data producers. Building analysis frameworks that can keep up with such a high production rate, however, is only part of the problem: current challenges include dealing with articulated data repositories where objects are connected by multiple relationships, managing complex processing pipelines where each step depends on a large number of configuration parameters and ensuring reproducibility, error control and usability by non-technical staff. Here we describe an automated infrastructure built to address the above issues in the context of the analysis of the data produced by the CRS4 next-generation sequencing facility. The system integrates open source tools, either written by us or publicly available, into a framework that can handle the whole data transformation process, from raw sequencer output to primary analysis results.
Keywords :
Big Data; bioinformatics; CRS4 next generation sequencing facility; automated infrastructure; big data; building analysis frameworks; data repositories; data transformation process; error control; high throughput DNA sequencers; high throughput bioinformatics; massive data producers; open source tools; raw sequencer output; reproducibility; usability; Bioinformatics; Genomics; Muscles; Simple object access protocol; Bioinformatics; MapReduce; NGS;
Conference_Titel :
High Performance Computing & Simulation (HPCS), 2014 International Conference on
Conference_Location :
Bologna
Print_ISBN :
978-1-4799-5312-7
DOI :
10.1109/HPCSim.2014.6903742