Title :
An Ontology-Based Topical Crawling Algorithm for Accessing Deep Web Content
Author :
Arya, K.V. ; Vadlamudi, B.R.
Author_Institution :
ABV-IIITM, Gwalior, India
Abstract :
Due to the large volume of the Web information and relatively high speed of information update, the coverage and quality of the retrieved pages by modern search engines is comparatively small. Given the volume of the Web and its frequency of content change, the coverage and quality of pages retrieved by modern search engines is relatively small since they crawl only hypertext links ignoring the search forms which are the entry points for accessing deep web content where two-thirds of information is resides. In this paper an algorithm has been designed to enable topical crawlers to access hidden web content by using domain based ontology to determine the forms´ relevance to the domain. In this work scientific research publications domain has been considered. Experimental results show that proposed approach is better as compared to keyword based crawlers in terms of both relevancy and completeness.
Keywords :
Internet; information retrieval; ontologies (artificial intelligence); search engines; Web information; deep Web content access; domain based ontology; hypertext links; keyword based crawlers; ontology-based topical crawling algorithm; scientific research publications domain; search engines; topical crawlers; Arrays; Crawlers; Databases; HTML; Ontologies; Search engines; Web pages; Deep web; Domain ontology; Focused crawler; Form processing;
Conference_Titel :
Computer and Communication Technology (ICCCT), 2012 Third International Conference on
Conference_Location :
Allahabad
Print_ISBN :
978-1-4673-3149-4
DOI :
10.1109/ICCCT.2012.10