DocumentCode :
1317202
Title :
Higher Order Naïve Bayes: A Novel Non-IID Approach to Text Classification
Author :
Ganiz, Murat Can ; George, Cibin ; Pottenger, William M.
Author_Institution :
Dept. of Comput. Eng., Dogus Univ., Istanbul, Turkey
Volume :
23
Issue :
7
fYear :
2011
fDate :
7/1/2011 12:00:00 AM
Firstpage :
1022
Lastpage :
1034
Abstract :
The underlying assumption in traditional machine learning algorithms is that instances are Independent and Identically Distributed (IID). These critical independence assumptions made in traditional machine learning algorithms prevent them from going beyond instance boundaries to exploit latent relations between features. In this paper, we develop a general approach to supervised learning by leveraging higher order dependencies between features. We introduce a novel Bayesian framework for classification termed Higher Order Naïve Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages higher order relations between features across different instances. The approach is validated in the classification domain on widely used benchmark data sets. Results obtained on several benchmark text corpora demonstrate that higher order approaches achieve significant improvements in classification accuracy over the baseline methods, especially when training data is scarce. A complexity analysis also reveals that the space and time complexity of HONB compare favorably with existing approaches.
Keywords :
Bayes methods; classification; learning (artificial intelligence); text analysis; higher order naïve Bayes; independent and identically distributed; machine learning; non-IID approach; supervised learning; text classification; Bayesian methods; Classification algorithms; Large scale integration; Machine learning; Machine learning algorithms; Mathematical model; Training data; IID.; Machine learning; naïve bayes; statistical relational learning; text classification;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.160
Filename :
5567099
Link To Document :
بازگشت