Title :
CosDic: Towards a Comprehensive System for Knowledge Discovery in Large-Scale Data: Architecture, Implementation and Case Studies
Author :
Wu, Bin ; Yang, Shengqi ; Zhao, Haizhou ; Gao, Yuan ; Suo, Lijun
Abstract :
The continued exponential growth in both the volume and the complexity of information is giving birth to a new challenge to the specific requirements of analysts, researchers and intelligence providers. In this paper, to move the scientific activity forward to practice, we elaborate a prototype of our on-going constructed system, CosDic, for knowledge discovery from extremely large-scale datasets. The major infrastructure of CosDic is deployed on a distributed cluster environment using MapReduce platform. To undertake the mining tasks from gigabytes to petabytes, we carefully devised our system, from architecture to particular algorithms, from under layer construction to upper layer public service interface, from effectiveness to efficiency. Moreover, to illustrate its functionality, we employ CosDic to a real-world huge dataset and demonstrate an integrated analysis procedure from initial raw data preprocessing to finally knowledge discovering. We show that CosDic has a good performance in such cloud-scale data computing.
Keywords :
Biology computing; Cloud computing; Computer architecture; Distributed computing; Intelligent agent; Internet; Large-scale systems; Parallel processing; Pervasive computing; Prototypes; MapReduce; distributed system; knowledge discovery;
Conference_Titel :
Web Intelligence and Intelligent Agent Technologies, 2009. WI-IAT '09. IEEE/WIC/ACM International Joint Conferences on
Conference_Location :
Milan, Italy
Print_ISBN :
978-0-7695-3801-3
Electronic_ISBN :
978-1-4244-5331-3
DOI :
10.1109/WI-IAT.2009.117