Title :
Online mining (recently) maximal frequent itemsets over data streams
Author :
Hua-Fu Li ; Suh-Yin Lee ; Shan, Man-Kwan
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Chiao Tung Univ., Hsinchu, Taiwan
Abstract :
A data stream is a massive, open-ended sequence of data elements continuously generated at a rapid rate. Mining data streams is more difficult than mining static databases because the huge, high-speed and continuous characteristics of streaming data. In this paper, we propose a new one-pass algorithm called DSM-MFI (stands for Data Stream Mining for Maximal Frequent Itemsets), which mines the set of all maximal frequent itemsets in landmark windows over data streams. A new summary data structure called summary frequent itemset forest (abbreviated as SFI-forest) is developed for incremental maintaining the essential information about maximal frequent itemsets embedded in the stream so far. Theoretical analysis and experimental studies show that the proposed algorithm is efficient and scalable for mining the set of all maximal frequent itemsets over the entire history of the data streams.
Keywords :
data mining; data structures; DSM-MFI; Data Stream Mining for Maximal Frequent Itemsets; SFI-forest; data streams; landmark windows; one-pass algorithm; online mining; static databases; summary data structure; Algorithm design and analysis; Computer science; Data engineering; Data mining; Data models; Data structures; History; Itemsets; Measurement; Transaction databases;
Conference_Titel :
Research Issues in Data Engineering: Stream Data Mining and Applications, 2005. RIDE-SDMA 2005. 15th International Workshop on
Print_ISBN :
0-7695-2390-0
DOI :
10.1109/RIDE.2005.13