Title :
Mining insightful classification rules directly and efficiently
Author :
Liu, Hongyan ; Chen, Jim ; Chen, Guoqing
Author_Institution :
Sch. of Econ. & Manage., Tsinghua Univ., Beijing, China
fDate :
6/21/1905 12:00:00 AM
Abstract :
Classification is one of the important problems in the field of data mining. Many algorithms have been proposed to solve this problem and each has its own drawback. This paper discusses issues about mining classification rules directly and proposes two algorithms, namely UARC and GARC. These algorithms use a more suitable association rule mining technique to find insightful and a complete set of rules directly and accurately. Unlike most other association rule mining algorithms, the algorithms proposed in the paper can find both frequent k-itemset and rules at the same step. After each scan of the database, only rule itemsets and excluded itemsets are saved and used to exclude much more itemsets to generate larger candidate itemsets, which will save much computation time and memory. Using the information gain criterion, many training cases which satisfy a special condition can be deleted from database, which will lead to fewer I/O times for every remaining scan of a database. Finally, a criterion is defined to terminate the whole mining process much earlier and at the same time produce a meaningful rule
Keywords :
data mining; pattern classification; very large databases; GARC; UARC; association rule mining; classification rule mining; computation time; data mining; frequent k-itemset; information gain criterion; large database; training cases; Association rules; Classification algorithms; Classification tree analysis; Costs; Data mining; Databases; Decision trees; Itemsets; Testing; Training data;
Conference_Titel :
Systems, Man, and Cybernetics, 1999. IEEE SMC '99 Conference Proceedings. 1999 IEEE International Conference on
Conference_Location :
Tokyo
Print_ISBN :
0-7803-5731-0
DOI :
10.1109/ICSMC.1999.823349