DocumentCode :
2824846
Title :
Mining frequent closed itemsets over data stream based on Bitvector and digraph
Author :
Zhang, Guanglu ; Lei, Jingsheng ; Wu, Xinghui
Author_Institution :
Sch. of Math. & Stat., Hainan Normal Univ., Haikou, China
Volume :
2
fYear :
2010
fDate :
21-24 May 2010
Abstract :
A data stream is a continuous, huge, fast changing, rapid, infinite sequence of data elements. The nature of streaming data makes it essential to use online algorithms which require only one scan over the data for knowledge discovery. Mining frequent patterns on streaming data is a new challenging problem. Recent research mainly focuses on mining frequent itemsets over data stream. However, when the threshold of support set is small, the number of frequent itemsets is staggering. moreover frequent closed itemsets is completely contains the information of frequent itemsets and the total number of frequent closed itemsets is still much smaller than that of frequent itemsets. Therefore, mining frequent closed itemsets is a better choice. In this paper, A new algorithm named MFCIDS_BD (Mining Frequent Closed Itemsets Over Data Stream Based On Bit-vector and Digraph) is proposed to mine frequent all closed itemset in the transaction sliding window over data stream. MFCIDS_BD uses a Bit-vector table based data structure, an an effective bit-sequence representation of items, to dynamically maintain all information over transactions slding window. A digraph based data structure is developed in the MFCIDS_BD to depth-first mine all CFs. The maximum number of nodes in digraph does not exceed the total number of items in data stream. in the mining process, MFCIDS_BD uses simple bit “AND” Operations to calculate the support of itemset. MFC-DS_BD Effectively save the save memory and improve speed. Experimental results show that MFCIDS_BD is effective and efficient.
Keywords :
data mining; data structures; directed graphs; MFCIDS-BD algorithm; bit AND operations; bit sequence representation; bit vector table; data streaming; data structure; digraph; frequent closed itemset mining; frequent pattern mining; knowledge discovery; online algorithms; transaction sliding window; Computer science; Data mining; Data structures; Information science; Itemsets; Mathematics; Pattern analysis; Performance analysis; Statistics; Tree data structures; data mining; data stream; frequent closed itemsets;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Future Computer and Communication (ICFCC), 2010 2nd International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-5821-9
Type :
conf
DOI :
10.1109/ICFCC.2010.5497389
Filename :
5497389
Link To Document :
بازگشت