Title :
On the Scalability of Supervised Learners in Metagenomics
Author :
U, ManChon ; Mahamuda, Vasim ; Rasheed, Khaled
Author_Institution :
Dept. of Comput. Sci., Univ. of Georgia, Athens, GA, USA
Abstract :
Metagenomics deals with the study of micro-organisms such as prokaryotes that are found in samples from natural environments. The samples obtained from the environment may contain DNA from many different species of micro-organisms including bacteria and archea. Micro-organisms are responsible for most of the symbiotic activity on earth. They are also responsible for the complex chemical reactions which take place on the surface of the earth, which help maintain earth´s ecological balance. With the increase in genome sequencing projects there has been a considerable increase in the amount of assembled sequencing data. In this article, we apply supervised learners namely decision trees, Bayesian networks and decision tables to see how the performance degrades when the number of species present in the metagenomic sample increases. We also try to see how the performance of the metagenomic sample changes as the percentage of unknown sequences in the metagenomic sample is varied.
Keywords :
Bayes methods; DNA; bioinformatics; biology computing; cellular biophysics; decision tables; decision trees; environmental science computing; genetics; genomics; learning (artificial intelligence); microorganisms; molecular biophysics; Bayesian networks; DNA; archea; assembled sequencing data; bacteria; complex chemical reactions; decision tables; decision trees; genome sequencing projects; metagenomic sample; microorganisms; supervised learner; symbiotic activity; Accuracy; Bioinformatics; Classification algorithms; Classification tree analysis; DNA; Machine learning; Bayesian Networks; Binning; Bioinformatics; Decision Trees; Machine Learning; Metagenomics;
Conference_Titel :
Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4244-9211-4
DOI :
10.1109/ICMLA.2010.123