Title of article

Polychotomiser for Case-based Reasoning beyond the Traditional Bayesian Classification Approach

Author/Authors

Dino Isa، نويسنده , , Lam Hong Lee، نويسنده , , V.P. Kallimani، نويسنده , , R. Prasad، نويسنده ,

Issue Information

روزنامه با شماره پیاپی سال 2008

Pages

From page

To page

Abstract

This work implements an enhanced Bayesian classifier with better performance as compared to the ordinary naive Bayes classifier when used with domains and datasets of varying characteristics. Text classification is an active and on-going research field of Artificial Intelligence (AI). Text classification is defined as the task of learning methods for categorising collections of electronic text documents into their annotated classes, based on its contents. An increasing number of statistical approaches have been developed for text classification, including k-nearest neighbor classification, naive Bayes classification, decision tree, rules induction, and the algorithm implementing the structural risk minimisation theory called the support vector machine. Among the approaches used in these applications, naive Bayes classifiers have been widely used because of its simplicity. However this generative method has been reported to be less accurate than the discriminative methods such as SVM. Some researches have proven that the naive Bayes classifier performs surprisingly well in many other domains with certain specialised characteristics. The main aim of this work is to quantify the weakness of traditional naive Bayes classification and introduce an enhance Bayesian classification approach with additional innovative techniques to perform better than the traditional naive Bayes classifier. Our research goal is to develop an enhanced Bayesian probabilistic classifier by introducing different tournament structures ranking algorithms along with a high relevance keywords extraction facility and an accurately calculated weighting factors facility. These were done to improve the performance of the classification tasks for specific datasets with different characteristics. Other researches have used general datasets, such as Reuters-21578 and 20_newsgroups to validate the performance of their classifiers. Our approach is easily adapted to datasets with different characteristics in terms of the degree of similarity between classes, multi-categorised documents, and different dataset organisations. As previously mentioned we introduce several techniques such as tournament structures ranking algorithms, higher relevance keyword extraction, and automatically computed document dependent (ACDD) weighting factors. Each technique has unique response while been implemented in datasets with different characteristics but has shown to give outstanding performance in most cases. We have successfully optimised our techniques for individual datasets with different characteristics based on our experimental results.

Keywords

Text classification , Bayesian filtering , Probability , Case-based reasoning , Bayes Theorem

Journal title

Computer and Information Science

Serial Year

2008

Journal title

Computer and Information Science

Record number

678257

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=10&DC=678257