• DocumentCode
    3175808
  • Title

    Cloud-based clustering of text documents using the GHSOM algorithm on the GridGain platform

  • Author

    Sarnovsky, Martin ; Ulbrik, Z.

  • Author_Institution
    Dept. of Cybern. & artificial Intell., Tech. Univ. in Kosice, Kosice, Slovakia
  • fYear
    2013
  • fDate
    23-25 May 2013
  • Firstpage
    309
  • Lastpage
    313
  • Abstract
    This paper provides an overview of our research activities aimed on efficient use of distributed computing concepts for text-mining tasks. Work presented within this paper describes the GHSOM (Growing Hierarchical Self-Organizing Maps) algorithm for clustering of text documents and proposes the design and implementation of distributed version of this approach. Proposed implementation is based on JBOWL framework as a base for text mining. For distribution we used MapReduce paradigm implemented within the GridGain framework, which was used as a cloud application platform. Experiments were performed on standard Reuters dataset and for testing purposes we decided to use a simple private cloud infrastructure.
  • Keywords
    cloud computing; data mining; parallel programming; pattern clustering; self-organising feature maps; text analysis; GHSOM algorithm; GridGain platform; JBOWL framework; Java bag-of-words library; MapReduce paradigm; Reuters dataset; cloud application platform; cloud-based clustering; growing hierarchical self-organizing maps algorithm; private cloud infrastructure; text documents clustering; text mining; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Informatics; Java; Neurons; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Applied Computational Intelligence and Informatics (SACI), 2013 IEEE 8th International Symposium on
  • Conference_Location
    Timisoara
  • Print_ISBN
    978-1-4673-6397-6
  • Type

    conf

  • DOI
    10.1109/SACI.2013.6608988
  • Filename
    6608988