• DocumentCode
    16497
  • Title

    Naïve Bayesian Classifiers with Multinomial Models for rRNA Taxonomic Assignment

  • Author

    Kuan-Liang Liu ; Tzu-Tsung Wong

  • Author_Institution
    Inst. of Inf. Manage., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    10
  • Issue
    5
  • fYear
    2013
  • fDate
    Sept.-Oct. 2013
  • Firstpage
    1
  • Lastpage
    1
  • Abstract
    The introduction of next generation sequencing in ecological studies has created a major revolution in microbial and fungal ecology. Direct sequencing of hypervariable regions from ribosomal RNA genes can provide rapid and inexpensive analysis for ecological communities. In order to get deep understanding from these rRNA fragments, the Ribosomal Database Project developed the ´RDP Classifier´ utilizing 8-mer nucleotide frequencies with Bayesian theorem to obtain taxonomy affiliation. The classifier is computationally efficient and works well with massive short sequences. However, the binary model employed in the RDP classifier does not consider the repetitive 8-mers in each reference sequence. Previous studies have pointed out that multinomial model usually results a better performance than binary model. In this study, we present the naïve Bayesian classifiers with multinomial models that take repetitive 8-mers into account for classifying microbial 16S and fungal 28S rRNA sequences. The results obtained from the multinomial approach were compared with those obtained from the binomial RDP classifier by 250-bp, 400-bp, 800-bp, and full-length reads to demonstrate that the multinomial approach can generally achieve a higher prediction accuracy in most hypervariable regions.
  • Keywords
    Bayes methods; RNA; bioinformatics; genetics; genomics; microorganisms; molecular biophysics; molecular configurations; Bayesian theorem; binomial RDP classifier; eight-mer nucleotide frequencies; fungal ecology; hypervariable regions; microbial ecology; multinomial models; naive Bayesian classifiers; next generation sequencing; rRNA fragments; rRNA sequences; rRNA taxonomic assignment; ribosomal RNA genes; ribosomal database project; Bayes methods; Biological system modeling; Clustering methods; Computational modeling; Data mining; Sequential analysis; Bayes methods; Biological system modeling; Clustering; Computational modeling; Databases; Mining methods and algorithms; Sequential analysis; Training; and association rules; classification;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2013.114
  • Filename
    6604390