Title :
Generalization and decision tree induction: efficient classification in data mining
Author :
Kamber, Micheline ; Winstone, Lara ; Gong, Wan ; Cheng, Shan ; Han, Jiawei
Author_Institution :
Sch. of Comput. Sci., Simon Fraser Univ., Burnaby, BC, Canada
Abstract :
Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration of efficient induction in large databases and the analysis of data at multiple abstraction levels. The paper addresses the efficiency and scalability issues by proposing a data classification method which integrates attribute oriented induction, relevance analysis, and the induction of decision trees. Such an integration leads to efficient, high quality, multiple level classification of large amounts of data, the relaxation of the requirement of perfect training sets, and the elegant handling of continuous and noisy data
Keywords :
classification; decision theory; deductive databases; generalisation (artificial intelligence); knowledge acquisition; trees (mathematics); very large databases; attribute oriented induction; data classification method; data mining; decision tree induction; efficient induction; generalization; large databases; multiple abstraction levels; multiple level classification; noisy data; perfect training sets; relevance analysis; scalability issues; Application software; Classification tree analysis; Data analysis; Data mining; Database systems; Decision making; Decision trees; Laboratories; Scalability; Testing;
Conference_Titel :
Research Issues in Data Engineering, 1997. Proceedings. Seventh International Workshop on
Conference_Location :
Birmingham
Print_ISBN :
0-8186-7849-6
DOI :
10.1109/RIDE.1997.583715