DocumentCode :
562716
Title :
Component based effective web crawler and indexer using web services
Author :
Vadivel, A. ; Shaila, S.G. ; Mahalakshmi, R. Devi ; Karthika, J.
Author_Institution :
Dept. of Comput. Applic., Multimedia Inf. Retrieval Group, Nat. Inst. of Technol., Tamilanadu, India
fYear :
2012
fDate :
30-31 March 2012
Firstpage :
792
Lastpage :
797
Abstract :
Designing and developing an effective web crawler is a challenging role in a large search engine. This paper proposes component based web crawler along with the indexer. The WebCrawler consist of crawler services and indexer services and realized as web services. The communication between the services is sent and received using XML, SOAP and WSDL. In the crawler service, the web pages are fetched and parsed for retrieving all the hyperlinks. The process is carried out recursively using Breadth-First strategy. The extracted URLs are downloaded and those web pages are sent to the indexer services by passing the message. In the indexer service, HTML pages are parsed, stop words are removed, stemming of keywords are carried out as pre-processing steps and the result is stored in the form of inverted index. We have evaluated the performance of the proposed design specification of the crawler with indexer and found that the number of pages retrieved is notably on the higher side.
Keywords :
Web services; Web sites; indexing; object-oriented programming; search engines; HTML pages; SOAP; WSDL; Web pages; Web services; XML; breadth-first strategy; component based effective Web crawler; crawler services; design specification; hyperlinks; indexer services; inverted index; search engine; Crawlers; Engines; HTML; Servers; Simple object access protocol; World Wide Web; XML; Inverted Index; Tokenization; URL; Web Crawler; Web Service;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Engineering, Science and Management (ICAESM), 2012 International Conference on
Conference_Location :
Nagapattinam, Tamil Nadu
Print_ISBN :
978-1-4673-0213-5
Type :
conf
Filename :
6215947
Link To Document :
بازگشت