DocumentCode
234825
Title
A lexicon pool augmented Naive Bayes Classifier for Nepali Text
Author
Thakur, S.K. ; Singh, V.K.
Author_Institution
Dept. of Comput. Sci., South Asian Univ., New Delhi, India
fYear
2014
fDate
7-9 Aug. 2014
Firstpage
542
Lastpage
546
Abstract
This paper presents our experimental work on machine classification of Nepali texts. We have implemented a Naive Bayes classifier for the task and then augmented it through a multinomial lexicon pooling. The lexicon-pooled Naive Bayes Classifier obtains better results on classification task as compared to a normal Naive Bayes implementation. This hybrid approach also helps in dealing with the unavailability of linguistic resources in Nepali (such as stemmer, stop word list and accurate POS tagger). The proposed lexicon-pooled Naive Bayes approach is evaluated by applying on a sufficiently large dataset of Nepalese news stories. The experimental results demonstrate the higher classification accuracy and usefulness of the method for Nepali text classification. The paper also contributes resources to Nepali language processing, in form of a Nepali news stories corpus and a domain specific lexicon for Nepali news stories.
Keywords
Bayes methods; computational linguistics; natural language processing; pattern classification; text analysis; Nepali language processing; Nepali news stories corpus; Nepali text classification; domain specific lexicon; lexicon pool augmented naive Bayes classifier; linguistic resources; machine classification; multinomial lexicon pooling; normal naive Bayes implementation; Accuracy; Pragmatics; Probability; Text categorization; Training; Training data; Vocabulary; Multinomial Lexicon Pooling; Naive Bayes; Nepali Text Corpus; Text Classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Contemporary Computing (IC3), 2014 Seventh International Conference on
Conference_Location
Noida
Print_ISBN
978-1-4799-5172-7
Type
conf
DOI
10.1109/IC3.2014.6897231
Filename
6897231
Link To Document