• DocumentCode
    2023070
  • Title

    A MapReduce based distributed LSI

  • Author

    Liu, Yang ; Li, Maozhen ; Hammoud, Suhel ; Alham, Nasullah Khalid ; Ponraj, Mahesh

  • Author_Institution
    Sch. of Eng. & Design, Brunel Univ., Uxbridge, UK
  • Volume
    6
  • fYear
    2010
  • fDate
    10-12 Aug. 2010
  • Firstpage
    2978
  • Lastpage
    2982
  • Abstract
    Latent Semantic Indexing is a widely used text mining technology nowadays due its effectiveness in dealing with the problems of synonymy and polysemy within a proper matrix scale. However LSI is enormously computationally intensive especially for processing large scale data. And effective solution is to increase the computational power available to LSI using multiple computing nodes. In this paper we propose a novel MapReduce based distributed LSI using Hadoop distributed computing architecture to implement K-means algorithm to cluster the documents and then using LSI on the clustered results. We evaluated the performances of the proposed MapReduce based LSI and comparison are made with standalone LSI. The results show a great improvement of LSI´s performance in terms of speed.
  • Keywords
    data mining; indexing; matrix algebra; pattern clustering; text analysis; Hadoop distributed computing architecture; MapReduce; distributed LSI; large scale data processing; latent semantic indexing; multiple computing nodes; text mining technology; Clustering algorithms; Computational modeling; Indexing; Large scale integration; Matrix decomposition; Semantics; Sockets; Distributed computing; K-mean; LSI; MapReduce; SVD;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
  • Conference_Location
    Yantai, Shandong
  • Print_ISBN
    978-1-4244-5931-5
  • Type

    conf

  • DOI
    10.1109/FSKD.2010.5569083
  • Filename
    5569083