DocumentCode :
3225118
Title :
Binning DNA fragment of metagenome using a novel model
Author :
Hou Tao ; Liu Yun ; Liu Fu ; Wang Ke ; Xie Jian
Author_Institution :
Coll. of Commun. Eng., Jilin Univ., Changchun, China
fYear :
2015
fDate :
23-25 May 2015
Firstpage :
4760
Lastpage :
4765
Abstract :
An essential task addressed in the metagenomics data analysis is to predict the organism of each DNA fragment from a sequenced metagenome, which can aid in linking gene functions to members of the community or estimate the microbial abundance of the studied sample. Some classifiers have been developed to assess the source organism of DNA fragments from metagenome. However, the majority of existing classifiers usually suffer from the lower classification accuracy at genus level. One of the major reasons is they cannot discriminate the training data from different taxonomic classes accurately, when the training data contain some outliers. However, the training genomic data (bacterial and archaeal genomes) usually contain a portion of outliers, which come from sequencing errors, phage invasions and some highly expressed genes, etc. The outliers, treated as noises prohibit the development of classifiers with a better performance. To overcome the difficulty, we presented a strategy based on support vector data description (SVDD) model, which can enhance the discriminating ability of the classifier by giving up some outliers in training genomic data. Experiments were performanced on simulated and real metagenomes. The results demonstrate that our classifier has high classification sensitivity, specificity and accuracy as well as low false negative rate.
Keywords :
DNA; biology computing; data analysis; genetics; genomics; microorganisms; pattern classification; support vector machines; DNA fragment; SVDD model; archaeal genomes; bacterial genomes; classification accuracy; classifiers; gene functions; genomic data; genus level; metagenomics data analysis; microbial abundance; organism; sequenced metagenome; support vector data description; taxonomic classes; Accuracy; Bioinformatics; DNA; Genomics; Sensitivity; Support vector machines; Training; Binning; Metagenomics; SVDD; Taxonomic classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control and Decision Conference (CCDC), 2015 27th Chinese
Conference_Location :
Qingdao
Print_ISBN :
978-1-4799-7016-2
Type :
conf
DOI :
10.1109/CCDC.2015.7162767
Filename :
7162767
Link To Document :
بازگشت