Efficiently Finding Top-K Items from Evolving Distributed Data Streams

Author

Baoyuan Qi ; Gang Ma ; Zhongzhi Shi ; Wei Wang

Author_Institution

Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China

fYear

2014

fDate

27-29 Aug. 2014

Firstpage

137

Lastpage

140

Abstract

The problem of efficiently finding top-k frequent items has attracted much attention in recent years. Storage constraints in the processing node and intrinsic evolving feature of the data streams are two main challenges. In this paper, we propose a method to tackle these two challenges based on space-saving and gossip-based algorithms respectively. Our method is implemented on SAMOA, a scalable advanced massive online analysis machine learning framework. The experimental results show its effectiveness and scalability.

Keywords

data mining; learning (artificial intelligence); SAMOA framework; evolving distributed data streams; gossip-based algorithm; scalable advanced massive online analysis machine learning framework; space-saving algorithm; top-k frequent items; Data mining; Distributed databases; Machine learning algorithms; Monitoring; Peer-to-peer computing; Protocols; Radiation detectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Semantics, Knowledge and Grids (SKG), 2014 10th International Conference on

Conference_Location

Beijing

Type

conf

DOI

10.1109/SKG.2014.18

Filename

6964679

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=162496