DocumentCode :
3165278
Title :
Itemsets for Real-Valued Datasets
Author :
Tatti, Nikolaj
Author_Institution :
Dept. of Inf. & Comput. Sci., Aalto Univ., Espoo, Finland
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
717
Lastpage :
726
Abstract :
Pattern mining is one of the most well-studied sub fields in exploratory data analysis. While there is a significant amount of literature on how to discover and rank item sets efficiently from binary data, there is surprisingly little research done in mining patterns from real-valued data. In this paper we propose a family of quality scores for real-valued item sets. We approach the problem by considering casting the dataset into a binary data and computing the support from this data. This naive approach requires us to select thresholds. To remedy this, instead of selecting one set of thresholds, we treat thresholds as random variables and compute the average support. We show that we can compute this support efficiently, and we also introduce two normalisations, namely comparing the support against the independence assumption and, more generally, against the partition assumption. Our experimental evaluation demonstrates that we can discover statistically significant patterns efficiently.
Keywords :
data analysis; data mining; random processes; binary data; exploratory data analysis; itemset discovery; itemset ranking; naive approach; normalisations; pattern mining; quality scores; random variables; real-valued datasets; real-valued itemsets; statistically significant pattern discovery; Data mining; Itemsets; Random variables; Reactive power; Standards; Vectors; itemsets; pattern mining; real-valued itemsets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
ISSN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2013.138
Filename :
6729556
Link To Document :
بازگشت