• DocumentCode
    234825
  • Title

    A lexicon pool augmented Naive Bayes Classifier for Nepali Text

  • Author

    Thakur, S.K. ; Singh, V.K.

  • Author_Institution
    Dept. of Comput. Sci., South Asian Univ., New Delhi, India
  • fYear
    2014
  • fDate
    7-9 Aug. 2014
  • Firstpage
    542
  • Lastpage
    546
  • Abstract
    This paper presents our experimental work on machine classification of Nepali texts. We have implemented a Naive Bayes classifier for the task and then augmented it through a multinomial lexicon pooling. The lexicon-pooled Naive Bayes Classifier obtains better results on classification task as compared to a normal Naive Bayes implementation. This hybrid approach also helps in dealing with the unavailability of linguistic resources in Nepali (such as stemmer, stop word list and accurate POS tagger). The proposed lexicon-pooled Naive Bayes approach is evaluated by applying on a sufficiently large dataset of Nepalese news stories. The experimental results demonstrate the higher classification accuracy and usefulness of the method for Nepali text classification. The paper also contributes resources to Nepali language processing, in form of a Nepali news stories corpus and a domain specific lexicon for Nepali news stories.
  • Keywords
    Bayes methods; computational linguistics; natural language processing; pattern classification; text analysis; Nepali language processing; Nepali news stories corpus; Nepali text classification; domain specific lexicon; lexicon pool augmented naive Bayes classifier; linguistic resources; machine classification; multinomial lexicon pooling; normal naive Bayes implementation; Accuracy; Pragmatics; Probability; Text categorization; Training; Training data; Vocabulary; Multinomial Lexicon Pooling; Naive Bayes; Nepali Text Corpus; Text Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Contemporary Computing (IC3), 2014 Seventh International Conference on
  • Conference_Location
    Noida
  • Print_ISBN
    978-1-4799-5172-7
  • Type

    conf

  • DOI
    10.1109/IC3.2014.6897231
  • Filename
    6897231