DocumentCode :
2982472
Title :
Efficient Algorithms for Finding Richer Subgroup Descriptions in Numeric and Nominal Data
Author :
Mampaey, M. ; Nijssen, Siegfried ; Feelders, A. ; Knobbe, Arno
Author_Institution :
LIACS, Leiden Univ., Leiden, Netherlands
fYear :
2012
fDate :
10-13 Dec. 2012
Firstpage :
499
Lastpage :
508
Abstract :
Subgroup discovery systems are concerned with finding interesting patterns in labeled data. How these systems deal with numeric and nominal data has a large impact on the quality of their results. In this paper, we consider two ways to extend the standard pattern language of subgroup discovery: using conditions that test for interval membership for numeric attributes, and value set membership for nominal attributes. We assume a greedy search setting, that is, iteratively refining a given subgroup, with respect to a (convex) quality measure. For numeric attributes, we propose an algorithm that finds the optimal interval in linear (rather than quadratic) time, with respect to the number of examples and split points. Similarly, for nominal attributes, we show that finding the optimal set of values can be achieved in linear (rather than exponential) time, with respect to the number of examples and the size of the domain of the attribute. These algorithms operate by only considering subgroup refinements that lie on a convex hull in ROC space, thus significantly narrowing down the search space. We further provide efficient algorithms specifically for the popular Weighted Relative Accuracy quality measure, taking advantage of some of its properties. Our algorithms are shown to perform well in practice, and furthermore provide additional expressive power leading to higher-quality results.
Keywords :
data mining; ROC space; convex hull; greedy search setting; interesting patterns; interval membership; labeled data; linear time; nominal attributes; nominal data; numeric attributes; numeric data; standard pattern language; subgroup descriptions; subgroup discovery systems; value set membership; weighted relative accuracy quality measure; Accuracy; Complexity theory; Context; Data mining; Decision trees; Gain measurement; Weight measurement; ROC analysis; convex functions; nominal data; numeric data; subgroup discovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
ISSN :
1550-4786
Print_ISBN :
978-1-4673-4649-8
Type :
conf
DOI :
10.1109/ICDM.2012.117
Filename :
6413744
Link To Document :
بازگشت