• DocumentCode
    1040398
  • Title

    Frequent Closed Sequence Mining without Candidate Maintenance

  • Author

    Wang, Jianyong ; Han, Jiawei ; Li, Chun

  • Author_Institution
    Tsinghua Univ., Beijing
  • Volume
    19
  • Issue
    8
  • fYear
    2007
  • Firstpage
    1042
  • Lastpage
    1056
  • Abstract
    Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only a more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and- test paradigm, which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long. In this paper, we present BIDE, an efficient algorithm for mining frequent closed sequences without candidate maintenance. It adopts a novel sequence closure checking scheme called Bl-Directional Extension and prunes the search space more deeply compared to the previous algorithms by using the BackScan pruning method. A thorough performance study with both sparse and dense, real, and synthetic data sets has demonstrated that BIDE significantly outperforms the previous algorithm: It consumes an order(s) of magnitude less memory and can be more than an order of magnitude faster. It is also linearly scalable in terms of database size.
  • Keywords
    data mining; database management systems; BIDE; BackScan pruning method; bi-directional extension; data mining; database size; frequent closed sequence mining; frequent pattern mining algorithm; sequence closure checking scheme; Application software; Association rules; Bidirectional control; Computer bugs; Data mining; Itemsets; Large-scale systems; Open source software; Runtime; Spatial databases; BI-Directional Extension.; Data mining; frequent closed sequences;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.1043
  • Filename
    4262535