DocumentCode :
3124146
Title :
Exploiting False Discoveries -- Statistical Validation of Patterns and Quality Measures in Subgroup Discovery
Author :
Duivesteijn, Wouter ; Knobbe, Arno
Author_Institution :
LIACS, Leiden Univ., Leiden, Netherlands
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
151
Lastpage :
160
Abstract :
Subgroup discovery suffers from the multiple comparisons problem: we search through a large space, hence whenever we report a set of discoveries, this set will generally contain false discoveries. We propose a method to compare subgroups found through subgroup discovery with a statistical model we build for these false discoveries. We determine how much the subgroups we find deviate from the model, and hence statistically validate the found subgroups. Furthermore we propose to use this subgroup validation to objectively compare quality measures used in subgroup discovery, by determining how much the top subgroups we find with each measure deviate from the statistical model generated with that measure. We thus aim to determine how good individual measures are in selecting significant findings. We invoke our method to experimentally compare popular quality measures in several subgroup discovery settings.
Keywords :
data mining; statistical analysis; false discoveries; quality measures; statistical model; statistical pattern validation; subgroup discovery; Association rules; Complexity theory; Histograms; Search problems; Silicon; Size measurement; Statistical validation; subgroup discovery;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
ISSN :
1550-4786
Print_ISBN :
978-1-4577-2075-8
Type :
conf
DOI :
10.1109/ICDM.2011.65
Filename :
6137219
Link To Document :
بازگشت