Abstract :
Constraints applied on classic frequent patterns are too strict and may cause interesting patterns to be missed. Hence, researchers have proposed to mine a more relaxed version of frequent patterns, where transactions are allowed to miss some items in the itemset they support. Patterns exhibiting such "faults" are called frequent fault-tolerant patterns (FFT-patterns) if they are significant in number. In this paper, the term "pattern" is distinguished from "item- set" as referring to a pair (tidset times itemset). Unlike classical frequent patterns, the number of FFT- patterns grows exponentially not only with the number of items, but also with the number of transactions. Since the latter may reach millions, mining FFT-patterns by enumerating them becomes infeasible. Hence, the challenge is to represent FFT-patterns concisely without losing any useful information. To address this, we draw on the observation that, in transactional databases, the transactions themselves are not important from the data mining point-of- view; i.e. researchers are interested in finding itemsets contained in lots of transactions, rather than in the transactions per se. Therefore, we propose to mine only the frequent itemsets along with the statistical information of the supporting transaction sets, rather than enumerate entire FFT- patterns. Then we present our approach - the BIAS framework, consisting of Backtracking algorithm, Integer Linear Programming (ILP) constraints, and aggregation statistics to solve this problem. Algorithms under this framework not only increase the efficiency of the FFT-patterns mining process by more than an order of magnitude, but also provide a more comprehensive analysis of FFT-Patterns.
Keywords :
backtracking; data mining; database management systems; integer programming; linear programming; statistical analysis; transaction processing; aggregation statistics; backtracking algorithm; data mining; frequent fault-tolerant pattern mining; integer linear programming; statistical information; transactional database; Algorithm design and analysis; Combinatorial mathematics; Constraint optimization; Data mining; Fault tolerance; Integer linear programming; Itemsets; Pattern analysis; Statistics; Transaction databases;