Title :
Exploiting False Discoveries -- Statistical Validation of Patterns and Quality Measures in Subgroup Discovery
Author :
Duivesteijn, Wouter ; Knobbe, Arno
Author_Institution :
LIACS, Leiden Univ., Leiden, Netherlands
Abstract :
Subgroup discovery suffers from the multiple comparisons problem: we search through a large space, hence whenever we report a set of discoveries, this set will generally contain false discoveries. We propose a method to compare subgroups found through subgroup discovery with a statistical model we build for these false discoveries. We determine how much the subgroups we find deviate from the model, and hence statistically validate the found subgroups. Furthermore we propose to use this subgroup validation to objectively compare quality measures used in subgroup discovery, by determining how much the top subgroups we find with each measure deviate from the statistical model generated with that measure. We thus aim to determine how good individual measures are in selecting significant findings. We invoke our method to experimentally compare popular quality measures in several subgroup discovery settings.
Keywords :
data mining; statistical analysis; false discoveries; quality measures; statistical model; statistical pattern validation; subgroup discovery; Association rules; Complexity theory; Histograms; Search problems; Silicon; Size measurement; Statistical validation; subgroup discovery;
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
Print_ISBN :
978-1-4577-2075-8
DOI :
10.1109/ICDM.2011.65