DocumentCode
3165278
Title
Itemsets for Real-Valued Datasets
Author
Tatti, Nikolaj
Author_Institution
Dept. of Inf. & Comput. Sci., Aalto Univ., Espoo, Finland
fYear
2013
fDate
7-10 Dec. 2013
Firstpage
717
Lastpage
726
Abstract
Pattern mining is one of the most well-studied sub fields in exploratory data analysis. While there is a significant amount of literature on how to discover and rank item sets efficiently from binary data, there is surprisingly little research done in mining patterns from real-valued data. In this paper we propose a family of quality scores for real-valued item sets. We approach the problem by considering casting the dataset into a binary data and computing the support from this data. This naive approach requires us to select thresholds. To remedy this, instead of selecting one set of thresholds, we treat thresholds as random variables and compute the average support. We show that we can compute this support efficiently, and we also introduce two normalisations, namely comparing the support against the independence assumption and, more generally, against the partition assumption. Our experimental evaluation demonstrates that we can discover statistically significant patterns efficiently.
Keywords
data analysis; data mining; random processes; binary data; exploratory data analysis; itemset discovery; itemset ranking; naive approach; normalisations; pattern mining; quality scores; random variables; real-valued datasets; real-valued itemsets; statistically significant pattern discovery; Data mining; Itemsets; Random variables; Reactive power; Standards; Vectors; itemsets; pattern mining; real-valued itemsets;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location
Dallas, TX
ISSN
1550-4786
Type
conf
DOI
10.1109/ICDM.2013.138
Filename
6729556
Link To Document