Author :
Raissi, Chedy ; Poncelet, Pascal ; Teisseire, Maguelonne
Abstract :
Many recent real-world applications, such as network traffic monitoring, intrusion detection systems, sensor network data analysis, click stream mining and dynamic tracing of financial transactions, call for studying a new kind of data. Called stream data, this model is, in fact, a continuous, potentially infinite flow of information as opposed to finite, statically stored data sets extensively studied by researchers of the data mining community. An important application is to mine data streams for interesting patterns or anomalies as they happen. For data stream applications, the volume of data is usually too huge to be stored on permanent devices, main memory or to be scanned thoroughly more than once. In this paper we propose a new approach, called SPEED (sequential patterns efficient extraction in data streams), to identify frequent maximal sequential patterns in a data stream. The main originality of our mining method is that we use a novel data structure to maintain frequent sequential patterns coupled with a fast pruning strategy. At any time, users can issue requests for frequent maximal sequences over an arbitrary time interval. Furthermore, our approach produces an approximate support answer with an assurance that it does not bypass a user-defined frequency error threshold. Finally the proposed method is analyzed by a series of experiments on different datasets
Keywords :
data analysis; data mining; data structures; data stream mining; data structure; frequency error threshold; frequent maximal sequence; maximal sequential pattern mining; pruning strategy; sequential pattern extraction; sequential pattern identification; stream data analysis; Data analysis; Data mining; Frequency; Intelligent systems; Intrusion detection; Itemsets; Monitoring; Sensor systems and applications; Telecommunication traffic; Traffic control; Data Mining; Data Streams; Sequential Patterns;