DocumentCode :
3491269
Title :
Genomic signatures for metagenomic data analysis: Exploiting the reverse complementarity of tetranucleotides
Author :
Gori, Fabio ; Mavroedis, Dimitrios ; Jetten, Mike S M ; Marchiori, Elena
Author_Institution :
iCIS, Radboud Univ. Nijmegen, Nijmegen, Netherlands
fYear :
2011
fDate :
2-4 Sept. 2011
Firstpage :
149
Lastpage :
154
Abstract :
Metagenomics studies microbial communities by analyzing their genomic content directly sequenced from the environment. To this aim metagenomic datasets, consisting of many short DNA or RNA fragments, are computationally analyzed using statistical and machine learning methods with the general purpose of binning or taxonomic annotation. Many of these methods act on features derived from the data through a genomic signature, where a typical genomic signature of a fragment is a vector whose entries specify the frequency with which oligonucleotides appear in that fragment. In this article we analyze experimentally the ability of existing genomic signatures to facilitate the discrimination between fragments belonging to different genomes. We also propose new genomic signatures that take into account that fragments can have been sequenced from both strands of a genome; this is achieved by exploiting the reverse complementarity of oligonucleotides. We conduct extensive experiments on in silico sampled genomic fragments in order to assess comparatively the effectiveness of existing genomic signatures and those proposed in this article. Results of the experiments indicate that the direct use of the reverse complementarity of tetranucleotides in the definition of a genome signatures allows to have performances comparable to the best existing signatures using less features. Therefore the proposed genomic signatures provide an alternative set of features for analyzing metagenomic data. Online Supplementary material is available at http://www.cs.ru.nl/~gori/signature metagenomics/.
Keywords :
DNA; bioinformatics; biological techniques; genomics; learning (artificial intelligence); microorganisms; molecular biophysics; molecular configurations; statistical analysis; DNA fragments; RNA fragments; binning; fragment discrimination; genomic content; genomic signatures; machine learning methods; metagenomic data analysis; microbial communities; oligonucleotide frequency; statistical methods; taxonomic annotation; tetranucleotide reverse complementarity; vector; Bioinformatics; Conferences; DNA; Data analysis; Genomics; Organisms; Systems biology; genome signature; metagenome binning; metagenomic data analysis; metagenomics; taxonomic annotation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Systems Biology (ISB), 2011 IEEE International Conference on
Conference_Location :
Zhuhai
Print_ISBN :
978-1-4577-1661-4
Electronic_ISBN :
978-1-4577-1665-2
Type :
conf
DOI :
10.1109/ISB.2011.6033147
Filename :
6033147
Link To Document :
بازگشت