DocumentCode
3673220
Title
Assembly independent functional annotation of short-read data using SOFA: Short-ORF functional annotation
Author
Aria S. Hahn;Niels W. Hanson;Dongjae Kim;Kishori M. Konwar;Steven J. Hallam
Author_Institution
Department of Microbiology, University of British Columbia Vancouver, Canada
fYear
2015
Firstpage
1
Lastpage
6
Abstract
Accurate description of the microbial communities driving matter and energy transformations in complex ecosystems such as soils cannot yet be effectively accomplished using assembly-based approaches despite the rise of next generation sequencing technologies. Here we present SOFA, an open source pipeline enabling comparative functional annotation of unassembled short-read data. The pipeline attempts to merge mate pairs in fastq files, predicts open reading frames (ORFs) on merged and unmerged reads as small as 70 bps, and completes an additional step, we term `deduplication´. Deduplication prevents the double counting of ORFs predicted from unmerged paired-end reads by checking for homologous annotations that span the same ORF, allowing for quantitatively accurate predictions. The effectiveness of SOFA is validated with both simulated and bone fide soil metagenomes, and empirical results are compared to existing strategies for obtaining accurate ORF counts, and an analytical model of read duplication. SOFA enables downstream processing stages within the existing MetaPathways pipeline, and is available for download as a stand alone application at https://github.com under the MIT license.
Keywords
"Bioinformatics","Pipelines","Genomics","Databases","Soil","Proteins","Assembly"
Publisher
ieee
Conference_Titel
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2015 IEEE Conference on
Type
conf
DOI
10.1109/CIBCB.2015.7300324
Filename
7300324
Link To Document