Title of article :
Polychotomiser for Case-based Reasoning beyond the Traditional Bayesian Classification Approach
Author/Authors :
Dino Isa، نويسنده , , Lam Hong Lee، نويسنده , , V.P. Kallimani، نويسنده , , R. Prasad، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2008
Pages :
12
From page :
57
To page :
68
Abstract :
This work implements an enhanced Bayesian classifier with better performance as compared to the ordinary naive Bayes classifier when used with domains and datasets of varying characteristics. Text classification is an active and on-going research field of Artificial Intelligence (AI). Text classification is defined as the task of learning methods for categorising collections of electronic text documents into their annotated classes, based on its contents. An increasing number of statistical approaches have been developed for text classification, including k-nearest neighbor classification, naive Bayes classification, decision tree, rules induction, and the algorithm implementing the structural risk minimisation theory called the support vector machine. Among the approaches used in these applications, naive Bayes classifiers have been widely used because of its simplicity. However this generative method has been reported to be less accurate than the discriminative methods such as SVM. Some researches have proven that the naive Bayes classifier performs surprisingly well in many other domains with certain specialised characteristics. The main aim of this work is to quantify the weakness of traditional naive Bayes classification and introduce an enhance Bayesian classification approach with additional innovative techniques to perform better than the traditional naive Bayes classifier. Our research goal is to develop an enhanced Bayesian probabilistic classifier by introducing different tournament structures ranking algorithms along with a high relevance keywords extraction facility and an accurately calculated weighting factors facility. These were done to improve the performance of the classification tasks for specific datasets with different characteristics. Other researches have used general datasets, such as Reuters-21578 and 20_newsgroups to validate the performance of their classifiers. Our approach is easily adapted to datasets with different characteristics in terms of the degree of similarity between classes, multi-categorised documents, and different dataset organisations. As previously mentioned we introduce several techniques such as tournament structures ranking algorithms, higher relevance keyword extraction, and automatically computed document dependent (ACDD) weighting factors. Each technique has unique response while been implemented in datasets with different characteristics but has shown to give outstanding performance in most cases. We have successfully optimised our techniques for individual datasets with different characteristics based on our experimental results.
Keywords :
Text classification , Bayesian filtering , Probability , Case-based reasoning , Bayes Theorem
Journal title :
Computer and Information Science
Serial Year :
2008
Journal title :
Computer and Information Science
Record number :
678257
Link To Document :
بازگشت