• DocumentCode
    3165278
  • Title

    Itemsets for Real-Valued Datasets

  • Author

    Tatti, Nikolaj

  • Author_Institution
    Dept. of Inf. & Comput. Sci., Aalto Univ., Espoo, Finland
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    717
  • Lastpage
    726
  • Abstract
    Pattern mining is one of the most well-studied sub fields in exploratory data analysis. While there is a significant amount of literature on how to discover and rank item sets efficiently from binary data, there is surprisingly little research done in mining patterns from real-valued data. In this paper we propose a family of quality scores for real-valued item sets. We approach the problem by considering casting the dataset into a binary data and computing the support from this data. This naive approach requires us to select thresholds. To remedy this, instead of selecting one set of thresholds, we treat thresholds as random variables and compute the average support. We show that we can compute this support efficiently, and we also introduce two normalisations, namely comparing the support against the independence assumption and, more generally, against the partition assumption. Our experimental evaluation demonstrates that we can discover statistically significant patterns efficiently.
  • Keywords
    data analysis; data mining; random processes; binary data; exploratory data analysis; itemset discovery; itemset ranking; naive approach; normalisations; pattern mining; quality scores; random variables; real-valued datasets; real-valued itemsets; statistically significant pattern discovery; Data mining; Itemsets; Random variables; Reactive power; Standards; Vectors; itemsets; pattern mining; real-valued itemsets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • ISSN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2013.138
  • Filename
    6729556