• DocumentCode
    1021897
  • Title

    Theoretical and practical considerations of uncertainty and complexity in automated knowledge acquisition

  • Author

    Zhou, Xiao-Jia M. ; Dillon, Tharam S.

  • Author_Institution
    Dept. of Comput. Sci. & Comput. Eng., La Trobe Univ., Bundoora, Vic., Australia
  • Volume
    7
  • Issue
    5
  • fYear
    1995
  • fDate
    10/1/1995 12:00:00 AM
  • Firstpage
    699
  • Lastpage
    712
  • Abstract
    Inductive machine learning has become an important approach to automated knowledge acquisition from databases. The disjunctive normal form (DNF), as the common analytic representation of decision trees and decision tables (rules), provides a basis for formal analysis of uncertainty and complexity in inductive learning. A theory for general decision trees is developed based on C. Shannon´s (1949) expansion of the discrete DNF, and a probabilistic induction system PIK is further developed for extracting knowledge from real world data. Then we combine formal and practical approaches to study how data characteristics affect the uncertainty and complexity in inductive learning. Three important data characteristics, namely, disjunctiveness, noise and incompleteness, are studied. The combination of leveled pruning, leveled condensing and resampling estimation turns out to be a very powerful method for dealing with highly disjunctive and inadequate data. Finally the PIK system is compared with other recent inductive learning systems on a number of real world domains
  • Keywords
    computational complexity; decision theory; knowledge acquisition; learning by example; uncertainty handling; automated knowledge acquisition; common analytic representation; data characteristics; decision tables; decision trees; discrete DNF; disjunctive normal form; disjunctiveness; formal analysis; general decision trees; incompleteness; inductive learning; inductive learning systems; inductive machine learning; knowledge extraction; leveled condensing; leveled pruning; probabilistic induction system PIK; real world data; real world domains; resampling estimation; uncertainty; Binary trees; Boolean functions; Computer science; Data mining; Decision trees; Knowledge acquisition; Learning systems; Machine learning; Probability distribution; Uncertainty;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/69.469826
  • Filename
    469826