DocumentCode :
3144009
Title :
SMM: A data stream management system for knowledge discovery
Author :
Thakkar, Hetal ; Laptev, Nikolay ; Mousavi, Hamid ; Mozafari, Barzan ; Russo, Vincenzo ; Zaniolo, Carlo
fYear :
2011
fDate :
11-16 April 2011
Firstpage :
757
Lastpage :
768
Abstract :
The problem of supporting data mining applications proved to be difficult for database management systems and it is now proving to be very challenging for data stream management systems (DSMSs), where the limitations of SQL are made even more severe by the requirements of continuous queries. The major technical advances that achieved separately on DSMSs and on data stream mining algorithms have failed to converge and produce powerful data stream mining systems. Such systems, however, are essential since the traditional pull-based approach of cache mining is no longer applicable, and the push-based computing mode of data streams and their bursty traffic complicate application development. For instance, to write mining applications with quality of service (QoS) levels approaching those of DSMSs, a mining analyst would have to contend with many arduous tasks, such as support for data buffering, complex storage and retrieval methods, scheduling, fault-tolerance, synopsis-management, load shedding, and query optimization. Our Stream Mill Miner (SMM) system solves these problems by providing a data stream mining workbench that combines the ease of specifying high-level mining tasks, as in Weka, with the performance and QoS guarantees of a DSMS. This is accomplished in three main steps. The first is an open and extensible DSMS architecture where KDD queries can be easily expressed as user-defined aggregates (UDAs) - our system combines that with the efficiency of synoptic data structures and mining-aware load shedding and optimizations. The second key component of SMM is its integrated library of fast mining algorithms that are light enough to be effective on data streams. The third advanced feature of SMM is a Mining Model Definition Language (MMDL) that allows users to define the flow of mining tasks, integrated with a simple box&arrow GUI, to shield the mining analyst from the complexities of lower-level queries. SMM is the first DSMS capable of online mining and t his paper describes its architecture, design, and performance on mining queries.
Keywords :
SQL; data mining; database management systems; query languages; query processing; DSMS architecture; KDD query; SMM; SQL; cache mining; data buffering; data mining application; data stream management system; database management system; fault tolerance; integrated library; knowledge discovery; mining aware load shedding; mining model definition language; pull based approach; push based computing mode; quality of service level; query optimization; retrieval method; stream mill miner system; synopsis management; synoptic data structure; user defined aggregate; Aggregates; Data mining; Graphical user interfaces; Humidity; Libraries; Quality of service; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2011 IEEE 27th International Conference on
Conference_Location :
Hannover
ISSN :
1063-6382
Print_ISBN :
978-1-4244-8959-6
Electronic_ISBN :
1063-6382
Type :
conf
DOI :
10.1109/ICDE.2011.5767879
Filename :
5767879
Link To Document :
بازگشت