• DocumentCode
    2209006
  • Title

    Approximation of Frequentness Probability of Itemsets in Uncertain Data

  • Author

    Calders, Toon ; Garboni, Calin ; Goethals, Bart

  • Author_Institution
    Eindhoven Univ. of Technol., Eindhoven, Netherlands
  • fYear
    2010
  • fDate
    13-17 Dec. 2010
  • Firstpage
    749
  • Lastpage
    754
  • Abstract
    Mining frequent item sets from transactional datasets is a well known problem with good algorithmic solutions. Most of these algorithms assume that the input data is free from errors. Real data, however, is often affected by noise. Such noise can be represented by uncertain datasets in which each item has an existence probability. Recently, Bernecker et al. (2009) proposed the frequentness probability, i.e., the probability that a given item set is frequent, to select item sets in an uncertain database. A dynamic programming approach to evaluate this measure was given as well. We argue, however, that for the setting of Bernecker et al. (2009), that assumes independence between the items, already well-known statistical tools exist. We show how the frequentness probability can be approximated extremely accurately using a form of the central limit theorem. We experimentally evaluated our approximation and compared it to the dynamic programming approach. The evaluation shows that our approximation method is extremely accurate even for very small databases while at the same time it has much lower memory overhead and computation time.
  • Keywords
    approximation theory; data mining; dynamic programming; statistical distributions; transaction processing; central limit theorem; dynamic programming; frequent itemset mining; frequentness probability approximation; statistical tool; transactional dataset; uncertain data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2010 IEEE 10th International Conference on
  • Conference_Location
    Sydney, NSW
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4244-9131-5
  • Electronic_ISBN
    1550-4786
  • Type

    conf

  • DOI
    10.1109/ICDM.2010.42
  • Filename
    5694033