• DocumentCode
    468157
  • Title

    Learning Selective Averaged One-Dependence Estimators for Probability Estimation

  • Author

    Wang, Qing ; Zhou, Chuan-Hua ; Guo, Jian-Kui

  • Author_Institution
    Anhui Univ. of Technol., Anhui
  • Volume
    1
  • fYear
    2007
  • fDate
    24-27 Aug. 2007
  • Firstpage
    492
  • Lastpage
    496
  • Abstract
    Naive Bayes is a well-known effective and efficient classification algorithm, but its probability estimation performance is poor. Averaged one-dependence estimators, simply AODE, is a recently proposed semi-naive Bayes algorithm and demonstrates significantly high classification accuracy at a modest cost. In many data mining applications, however, accurate probability estimation is more desirable when making optimal decisions. Usually, probability estimation performance is measured by conditional log likelihood (CLL). In this paper, we first study the probability estimation performance of AODE and compare it to naive Bayes, tree- augumented naive Bayes, CLLTree, C4.4 (the improved version of C4.5 for better probability estimation) and Support Vector Machines. From our experiments, we find that AODE performs significantly better than the algorithms used to compare except C4.4, and performs slightly better than C4.4 although its classification accuracy is significantly better than C4.5. We then propose an efficient forward greedy feature selection algorithm for AODE and use the CLL score for attribute selection. The experimental results show that our algorithm achieves substantially improvement over AODE and significantly outperforms C4.4. Our experiments are conducted on the basis of 36 UCI data sets that cover a wide range of domains and data characteristics and we run all the algorithms within the Weka platform.
  • Keywords
    Bayes methods; estimation theory; pattern classification; probability; support vector machines; trees (mathematics); C4.4; CLLTree; Weka platform; averaged one-dependence estimators; classification algorithm; conditional log likelihood; data mining applications; forward greedy feature selection algorithm; probability estimation; semi-naive Bayes algorithm; support vector machines; tree- augumented naive Bayes; Classification algorithms; Costs; Data mining; Engineering management; Equations; Information technology; Machine learning; Support vector machine classification; Support vector machines; Technology management;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
  • Conference_Location
    Haikou
  • Print_ISBN
    978-0-7695-2874-8
  • Type

    conf

  • DOI
    10.1109/FSKD.2007.384
  • Filename
    4405974