DocumentCode :
3183053
Title :
A parallel approach to context-based term weighting
Author :
Arora, Silky ; Chakravarty, Shampa
Author_Institution :
Dept. of Inf. Technol., Netaji Subhas Inst. of Technol., New Delhi, India
fYear :
2011
fDate :
11-14 Dec. 2011
Firstpage :
951
Lastpage :
956
Abstract :
Information retrieval and extraction essentially rely on estimating the relevance of words present in a large corpus of documents or text. One of the approaches to measuring relevance is analyzing the importance of words based on their statistical distribution within a document. Quite another approach ensues from their linguistic relevance within a logically perceived context. Literature presents a body of work done employing both statistical as well as contextual approaches. The challenge currently is on enhancing the performance of document analysis and clustering systems. Ever since we witnessed a massive explosion of information and raw data available on the web, their analysis demands more rigorous computations and processing. Given the widely distributed environment as a backbone platform for these systems to operate, there is an urgent need to develop techniques to scale up their performance on multiple processors. We propose a parallelized strategy to estimate the statistical as well as contextual relevance of words, employing master-slave configuration on a cluster of processors. Our parallel algorithm has been successfully tested on a self-made Beowulf cluster comprising ten nodes, showing significant performance improvement over single processor.
Keywords :
information retrieval; parallel algorithms; pattern clustering; statistical distributions; text analysis; Beowulf cluster; clustering system; context-based term weighting; document analysis; document corpus; information extraction; information retrieval; linguistic relevance; parallel algorithm; parallel approach; statistical distribution; text corpus; Algorithm design and analysis; Clustering algorithms; Context; Measurement; Pragmatics; Program processors; Switches; Amdahl´s Law; Cluster computing; Context based Text Classification; Retrieval; TF-IDF;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information and Communication Technologies (WICT), 2011 World Congress on
Conference_Location :
Mumbai
Print_ISBN :
978-1-4673-0127-5
Type :
conf
DOI :
10.1109/WICT.2011.6141376
Filename :
6141376
Link To Document :
بازگشت