• DocumentCode
    2866053
  • Title

    Mining approximate frequent itemsets from noisy data

  • Author

    Liu, Jinze ; Paulsen, Susan ; Wang, Wei ; Nobel, Andrew ; Prins, Jan

  • Author_Institution
    Dept. of Comput. Sci., North Carolina Univ., Chapel Hill, NC, USA
  • fYear
    2005
  • fDate
    27-30 Nov. 2005
  • Abstract
    Frequent itemset mining is a popular and important first step in analyzing data sets across a broad range of applications. The traditional, "exact" approach for finding frequent itemsets requires that every item in the itemset occurs in each supporting transaction. However, real data is typically subject to noise, and in the presence of such noise, traditional itemset mining may fail to detect relevant itemsets, particularly those large itemsets that are more vulnerable to noise. In this paper we propose approximate frequent itemsets (AFI), as a noise-tolerant itemset model. In addition to the usual requirement for sufficiently many supporting transactions, the AFI model places constraints on the fraction of errors permitted in each item column and the fraction of errors permitted in a supporting transaction. Taken together, these constraints winnow out the approximate itemsets that exhibit systematic errors. In the context of a simple noise model, we demonstrate that AFI is better at recovering underlying data patterns, while identifying fewer spurious patterns than either the exact frequent itemset approach or the existing error tolerant itemset approach of Yang et al.
  • Keywords
    data analysis; data mining; approximate frequent itemset; data patterns; data sets analysis; error tolerant itemset; exact frequent itemset; frequent itemset mining; noise-tolerant itemset model; noisy data; Application software; Association rules; Computer science; Context modeling; Data analysis; Data mining; Itemsets; Operations research; Relational databases; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, Fifth IEEE International Conference on
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2278-5
  • Type

    conf

  • DOI
    10.1109/ICDM.2005.93
  • Filename
    1565766