DocumentCode :
800860
Title :
Highly scalable and robust rule learner: performance evaluation and comparison
Author :
Kurgan, Lukasz A. ; Cios, Krzysztof J. ; Dick, Scott
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Alberta, Edmonton, Alta., Canada
Volume :
36
Issue :
1
fYear :
2006
Firstpage :
32
Lastpage :
53
Abstract :
Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.
Keywords :
competitive intelligence; computational complexity; data mining; decision support systems; greedy algorithms; learning (artificial intelligence); very large databases; DataSqueezer rule learner; bioinformatics; business intelligence; data mining system; dataset mining; greedy rule builder; inductive supervised rule extraction algorithm; log-linear asymptotic complexity; machine learning; real-time enterprise-level decision support system; Bioinformatics; Companies; Computer science; Data mining; Decision support systems; Decision trees; Drugs; Production; Real time systems; Robustness; Complexity; DataSqueezer; data mining; machine learning; missing data; rule induction; rule learner; Algorithms; Artificial Intelligence; Database Management Systems; Databases, Factual; Decision Support Techniques; Information Storage and Retrieval;
fLanguage :
English
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
Publisher :
ieee
ISSN :
1083-4419
Type :
jour
DOI :
10.1109/TSMCB.2005.852983
Filename :
1580617
Link To Document :
بازگشت