Title :
Data stream mining to address big data problems
Author :
Olmezogullari, E. ; Ari, I. ; Celebi, O.F. ; Ergut, Salih
Author_Institution :
Bilgisayar Muhendisligi Bolumu, Ozyegin Univ., İstanbul, Turkey
Abstract :
Today, the IT world is trying to cope with “big data” problems (data volume, velocity, variety, veracity) on the path to obtaining useful information. In this paper, we present implementation details and performance results of realizing “online” Association Rule Mining (ARM) over big data streams for the first time in the literature. Specifically, we added Apriori and FP-Growth algorithms for stream mining inside an event processing engine, called Esper. Using the system, these two algorithms were compared over LastFM social music site data and by using tumbling windows. The better-performing FP-Growth was selected and used in creation of a real-time rule-based recommendation engine. Our most important findings show that online association rule mining can generate (1) more rules, (2) much faster and more efficiently, and (3) much sooner than offline rule mining. In addition, we have found many interesting and realistic musical preference rules such as “George Harrison⇒Beatles”. We hope that our findings can shed light on the design and implementation of other big data analytics systems in the future.
Keywords :
data analysis; data mining; information retrieval; music; recommender systems; social networking (online); Apriori algorithm; Beatles; Esper; FP-Growth algorithm; George Harrison; IT world; LastFM social music site data; big data analytics system; big data problem; data stream mining; data variety; data velocity; data veracity; data volume; event processing engine; musical preference rules; online ARM; online association rule mining; real-time rule-based recommendation engine; rule generation; tumbling windows; Association rules; Big data; Engines; Real-time systems; Software; Software algorithms; Apriori; Data stream mining; FP-Growth; association rule mining; complex event processing;
Conference_Titel :
Signal Processing and Communications Applications Conference (SIU), 2013 21st
Conference_Location :
Haspolat
Print_ISBN :
978-1-4673-5562-9
Electronic_ISBN :
978-1-4673-5561-2
DOI :
10.1109/SIU.2013.6531483