• DocumentCode
    3183053
  • Title

    A parallel approach to context-based term weighting

  • Author

    Arora, Silky ; Chakravarty, Shampa

  • Author_Institution
    Dept. of Inf. Technol., Netaji Subhas Inst. of Technol., New Delhi, India
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    951
  • Lastpage
    956
  • Abstract
    Information retrieval and extraction essentially rely on estimating the relevance of words present in a large corpus of documents or text. One of the approaches to measuring relevance is analyzing the importance of words based on their statistical distribution within a document. Quite another approach ensues from their linguistic relevance within a logically perceived context. Literature presents a body of work done employing both statistical as well as contextual approaches. The challenge currently is on enhancing the performance of document analysis and clustering systems. Ever since we witnessed a massive explosion of information and raw data available on the web, their analysis demands more rigorous computations and processing. Given the widely distributed environment as a backbone platform for these systems to operate, there is an urgent need to develop techniques to scale up their performance on multiple processors. We propose a parallelized strategy to estimate the statistical as well as contextual relevance of words, employing master-slave configuration on a cluster of processors. Our parallel algorithm has been successfully tested on a self-made Beowulf cluster comprising ten nodes, showing significant performance improvement over single processor.
  • Keywords
    information retrieval; parallel algorithms; pattern clustering; statistical distributions; text analysis; Beowulf cluster; clustering system; context-based term weighting; document analysis; document corpus; information extraction; information retrieval; linguistic relevance; parallel algorithm; parallel approach; statistical distribution; text corpus; Algorithm design and analysis; Clustering algorithms; Context; Measurement; Pragmatics; Program processors; Switches; Amdahl´s Law; Cluster computing; Context based Text Classification; Retrieval; TF-IDF;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information and Communication Technologies (WICT), 2011 World Congress on
  • Conference_Location
    Mumbai
  • Print_ISBN
    978-1-4673-0127-5
  • Type

    conf

  • DOI
    10.1109/WICT.2011.6141376
  • Filename
    6141376