DocumentCode :
108820
Title :
Genomic Region Operation Kit for Flexible Processing of Deep Sequencing Data
Author :
Ovaska, Kristian ; Lyly, Lauri ; Sahu, B. ; Janne, Olli A. ; Hautaniemi, Sampsa
Author_Institution :
Genome-Scale Biol. & Inst. of Biomed., Univ. of Helsinki, Helsinki, Finland
Volume :
10
Issue :
1
fYear :
2013
fDate :
Jan.-Feb. 2013
Firstpage :
200
Lastpage :
206
Abstract :
Computational analysis of data produced in deep sequencing (DS) experiments is challenging due to large data volumes and requirements for flexible analysis approaches. Here, we present a mathematical formalism based on set algebra for frequently performed operations in DS data analysis to facilitate translation of biomedical research questions to language amenable for computational analysis. With the help of this formalism, we implemented the Genomic Region Operation Kit (GROK), which supports various DS-related operations such as preprocessing, filtering, file conversion, and sample comparison. GROK provides high-level interfaces for R, Python, Lua, and command line, as well as an extension C++ API. It supports major genomic file formats and allows storing custom genomic regions in efficient data structures such as red-black trees and SQL databases. To demonstrate the utility of GROK, we have characterized the roles of two major transcription factors (TFs) in prostate cancer using data from 10 DS experiments. GROK is freely available with a user guide from http://csbi.ltdk.helsinki.fi/grok/.
Keywords :
SQL; algebra; bioinformatics; cancer; genomics; GROK tool; Genomic Region Operation Kit; SQL database; biomedical research questions; computational analysis; data volume; deep sequencing data; file conversion; filtering; flexible processing; preprocessing; prostate cancer; red-black trees; sample comparison; set algebra; transcription factor; Algebra; Benchmark testing; Bioinformatics; Complexity theory; Databases; Genomics; Software; Bioinformatics; deep sequencing; genomic data analysis; region set algebra; software; Computational Biology; Gene Expression Profiling; Genome, Human; High-Throughput Nucleotide Sequencing; Humans; Male; Models, Genetic; Prostatic Neoplasms; Software; Transcription Factors;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2012.170
Filename :
6399464
Link To Document :
بازگشت