DocumentCode
1948969
Title
Concept Description - A Fresh Look
Author
Sönströd, Cecilia ; Johansson, Ulf
Author_Institution
Boras Univ., Boras
fYear
2007
fDate
12-17 Aug. 2007
Firstpage
2415
Lastpage
2420
Abstract
The main purpose of this paper is to look into the data mining task concept description, for which several rather different definitions exist. We argue for the definition used by CRISP-DM, where the overall goal is expressed as "gaining insights". Based on this, we propose that the two most important criteria for concept description models are accuracy and comprehensibility. The demand for comprehensibility rules out a straightforward use of many high-accuracy predictive modeling techniques; e.g. neural networks. Instead, we introduce rule extraction from predictive models as an alternative technique for concept description. In the experimentation, we show, using ten publicly available data sets, that the rule extractor used is clearly able to produce accurate and comprehensible descriptions. In addition, we discuss how concept description performance could be measured to capture both accuracy and comprehensibility. Comprehensibility is often translated into size; i.e. a smaller model is deemed more comprehensible. In practice, however, it would probably make more sense to treat comprehensibility as a binary property -the description is either comprehensible or not. Regarding accuracy, we argue that accuracies obtained on unseen data provide better information than accuracy on the entire data set. The reason is not that the model should be used for prediction, but that concepts found in this way are more likely to be general, and thus more informative.
Keywords
data mining; concept comprehensibility; data mining task concept description; rule extraction; Advertising; Artificial neural networks; Data mining; Measurement standards; Neural networks; Predictive models; Support vector machines; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks, 2007. IJCNN 2007. International Joint Conference on
Conference_Location
Orlando, FL
ISSN
1098-7576
Print_ISBN
978-1-4244-1379-9
Electronic_ISBN
1098-7576
Type
conf
DOI
10.1109/IJCNN.2007.4371336
Filename
4371336
Link To Document