Title :
Cloud-based classification of text documents using the Gridgain platform
Author :
Samovsky, M. ; Kacur, T.
Author_Institution :
Dept. of Cybern. & Artificial Intell., Tech. Univ. in Kosice, Kosice, Slovakia
Abstract :
Motivation for the research effort presented in this paper is to use the cloud computing storage and computational capabilities for text mining tasks. Cloud computing is nowadays favored approach in area of data- analysis and related fields by providing data storage and computational capabilities as the services. Main aim of our research activities is to design and develop experimental cloud platform for text mining tasks. In this particular paper we describe the design and implementation of a distributed tree-based algorithm for text categorization purposes. We used our own implementation of decision tree classification algorithm and used Gridgain framework for its cloud implementation. Cloud also provides storage services for handling large data collections as well as increases computational effectiveness as the algorithm is implemented in distributed fashion. We describe the experiments we have performed on the private cloud using the two datasets and analyze the results.
Keywords :
cloud computing; data analysis; data mining; decision trees; grid computing; pattern classification; storage management; text analysis; Gridgain platform; cloud computing storage; cloud-based text document classification; computational capabilities; data analysis; data storage; decision tree classification algorithm; distributed tree-based algorithm design; distributed tree-based algorithm implementation; large data collection handling; private cloud; storage services; text categorization; text mining tasks; Organizations; Vectors;
Conference_Titel :
Applied Computational Intelligence and Informatics (SACI), 2012 7th IEEE International Symposium on
Conference_Location :
Timisoara
Print_ISBN :
978-1-4673-1013-0
Electronic_ISBN :
978-1-4673-1012-3
DOI :
10.1109/SACI.2012.6250009