Title :
Research and application of distributed parallel search hadoop algorithm
Author_Institution :
Sch. of Inf. Sci. & Eng., Henan Univ. of Technol., Zhengzhou, China
Abstract :
Hadoop is an open source distributed parallel computing platform, which is mainly composed of MapReduce algorithm and a distributed file system. This paper introduces Hadoop and the related technologies, discusses in detail the idea and basic framework of MapReduce algorithm, together with the parallelization method and feasibility regarding the massive data involved in Internet search The paper also puts forward the idea and strategy to use MapReduce for parallel processing of webpage inverted index.
Keywords :
Web services; file organisation; information retrieval; parallel algorithms; public domain software; search problems; Hadoop; Internet search; MapReduce algorithm; Web page inverted index; distributed file system; distributed parallel algorithm; open source computing; parallel processing; Distributed databases; Educational institutions; File systems; Indexes; Internet; Parallel processing; Servers; Hadoop; MapReduce algorithm; inverted index; parallel computing;
Conference_Titel :
Systems and Informatics (ICSAI), 2012 International Conference on
Conference_Location :
Yantai
Print_ISBN :
978-1-4673-0198-5
DOI :
10.1109/ICSAI.2012.6223552