• DocumentCode
    2986402
  • Title

    A tree-based approach for efficiently mining approximate frequent itemsets

  • Author

    Koh, Jia-Ling ; Tu, Vi-Lang

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Taiwan Normal Univ., Taipei, Taiwan
  • fYear
    2010
  • fDate
    19-21 May 2010
  • Firstpage
    25
  • Lastpage
    36
  • Abstract
    The strategies for mining frequent itemsets, which is the essential part of discovering association rules, have been widely studied over the last decade. In real-world datasets, it is possible to discover multiple fragmented patterns but miss the longer true patterns due to random noise and errors in the data. Therefore, a number of methods have been proposed recently to discover approximate frequent itemsets. However, a challenge of providing an efficient algorithm for solving this problem is how to avoid costly candidate generation and test. In this paper, an algorithm, named FP-AFI (FP-tree based Approximate Frequent Itemsets mining algorithm), is developed to discover approximate frequent itemsets from a FP-tree-like structure. We define a recursive function for getting the set of transactions which fault-tolerant contain an itemset P. The patterns in the fault-tolerant supporting transactions of P are represented by the conditional AFP-trees of P. Moreover, to avoid re-constructing the tree structure in the mining process, two pseudo-projection operations on AFP-trees are provided to obtain the conditional AFP-trees of a candidate itemset systematically. Consequently, the approximate support of a candidate itemset and the item supports of each item in the candidate are obtained easily from the conditional AFP-trees. Hence, the constrain test of a candidate itemset is performed efficiently without additional database scan. The experimental results show that the FP-AFI algorithm performs much better than the FP-Apriori and the AFI algorithms in efficiency especially when the size of data set is large and the minimum threshold of approximate support is small. Moreover, the execution time of FP-AFI is scalable even when the error threshold parameters become large.
  • Keywords
    Data mining; Itemsets; Approximate frequent itemset mining; FP-tree structure;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Research Challenges in Information Science (RCIS), 2010 Fourth International Conference on
  • Conference_Location
    Nice, France
  • ISSN
    2151-1349
  • Print_ISBN
    978-1-4244-4839-5
  • Electronic_ISBN
    2151-1349
  • Type

    conf

  • DOI
    10.1109/RCIS.2010.5507360
  • Filename
    5507360