Learning Selective Averaged One-Dependence Estimators for Probability Estimation

Author

Wang, Qing ; Zhou, Chuan-Hua ; Guo, Jian-Kui

Author_Institution

Anhui Univ. of Technol., Anhui

Volume

1

fYear

2007

fDate

24-27 Aug. 2007

Firstpage

492

Lastpage

496

Abstract

Naive Bayes is a well-known effective and efficient classification algorithm, but its probability estimation performance is poor. Averaged one-dependence estimators, simply AODE, is a recently proposed semi-naive Bayes algorithm and demonstrates significantly high classification accuracy at a modest cost. In many data mining applications, however, accurate probability estimation is more desirable when making optimal decisions. Usually, probability estimation performance is measured by conditional log likelihood (CLL). In this paper, we first study the probability estimation performance of AODE and compare it to naive Bayes, tree- augumented naive Bayes, CLLTree, C4.4 (the improved version of C4.5 for better probability estimation) and Support Vector Machines. From our experiments, we find that AODE performs significantly better than the algorithms used to compare except C4.4, and performs slightly better than C4.4 although its classification accuracy is significantly better than C4.5. We then propose an efficient forward greedy feature selection algorithm for AODE and use the CLL score for attribute selection. The experimental results show that our algorithm achieves substantially improvement over AODE and significantly outperforms C4.4. Our experiments are conducted on the basis of 36 UCI data sets that cover a wide range of domains and data characteristics and we run all the algorithms within the Weka platform.

Keywords

Bayes methods; estimation theory; pattern classification; probability; support vector machines; trees (mathematics); C4.4; CLLTree; Weka platform; averaged one-dependence estimators; classification algorithm; conditional log likelihood; data mining applications; forward greedy feature selection algorithm; probability estimation; semi-naive Bayes algorithm; support vector machines; tree- augumented naive Bayes; Classification algorithms; Costs; Data mining; Engineering management; Equations; Information technology; Machine learning; Support vector machine classification; Support vector machines; Technology management;

fLanguage

English

Publisher

ieee

Conference_Titel

Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on

Conference_Location

Haikou

Print_ISBN

978-0-7695-2874-8

Type

conf

DOI

10.1109/FSKD.2007.384

Filename

4405974