Title :
A Distributed Parallel Algorithm for Web Page Inverted Indexes Construction on the Cluster Computing Systems
Author :
Zhengyou, Liang ; Tao, Chen
Author_Institution :
Dept. of Comput., Electron. & Inf., Guangxi Univ., Manning, China
Abstract :
Against the low index speed of serial algorithm for Web page inverted indexes construction, according to a characteristic of merge-sort algorithm meets the theory of scheduling divisible loads in parallel and distributed system, the paper proposed a new parallel algorithm basing on the triple sort-merge for Web page inverted indexes construction. The algorithm distributed parallel dealt with the two tasks parsing term and sorting these term postings which spent lots of time in the construction of inverted indexes, each term was represented as a triple, the time complexity of the algorithm was analyzed. This paper also applied a Java middleware named ProActive, designed and implemented a distributive parallel Web page indexer named P_Indexer on the cluster computing systems. The algorithm analysis and experimental results showed the parallel algorithm reaches high efficiency and good scalability.
Keywords :
Internet; Java; computational complexity; middleware; parallel algorithms; Java middleware; P Indexer; ProActive; Web page inverted indexes construction; cluster computing systems; distributed parallel algorithm; merge-sort algorithm; scheduling theory; time complexity; Algorithm design and analysis; Clustering algorithms; Concurrent computing; Distributed computing; Java; Parallel algorithms; Processor scheduling; Scheduling algorithm; Sorting; Web pages; ProActive middleware; Web page indexer; distributed parallel; inverted indexes; text search;
Conference_Titel :
Information Technology and Applications, 2009. IFITA '09. International Forum on
Conference_Location :
Chengdu
Print_ISBN :
978-0-7695-3600-2
DOI :
10.1109/IFITA.2009.553