Concept Description - A Fresh Look

Author

Sönströd, Cecilia ; Johansson, Ulf

Author_Institution

Boras Univ., Boras

fYear

2007

fDate

12-17 Aug. 2007

Firstpage

2415

Lastpage

2420

Abstract

The main purpose of this paper is to look into the data mining task concept description, for which several rather different definitions exist. We argue for the definition used by CRISP-DM, where the overall goal is expressed as "gaining insights". Based on this, we propose that the two most important criteria for concept description models are accuracy and comprehensibility. The demand for comprehensibility rules out a straightforward use of many high-accuracy predictive modeling techniques; e.g. neural networks. Instead, we introduce rule extraction from predictive models as an alternative technique for concept description. In the experimentation, we show, using ten publicly available data sets, that the rule extractor used is clearly able to produce accurate and comprehensible descriptions. In addition, we discuss how concept description performance could be measured to capture both accuracy and comprehensibility. Comprehensibility is often translated into size; i.e. a smaller model is deemed more comprehensible. In practice, however, it would probably make more sense to treat comprehensibility as a binary property -the description is either comprehensible or not. Regarding accuracy, we argue that accuracies obtained on unseen data provide better information than accuracy on the entire data set. The reason is not that the model should be used for prediction, but that concepts found in this way are more likely to be general, and thus more informative.

Keywords

data mining; concept comprehensibility; data mining task concept description; rule extraction; Advertising; Artificial neural networks; Data mining; Measurement standards; Neural networks; Predictive models; Support vector machines; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 2007. IJCNN 2007. International Joint Conference on

Conference_Location

Orlando, FL

ISSN

1098-7576

Print_ISBN

978-1-4244-1379-9

Electronic_ISBN

1098-7576

Type

conf

DOI

10.1109/IJCNN.2007.4371336

Filename

4371336