An Ontology-Based Topical Crawling Algorithm for Accessing Deep Web Content

Author

Arya, K.V. ; Vadlamudi, B.R.

Author_Institution

ABV-IIITM, Gwalior, India

fYear

2012

fDate

23-25 Nov. 2012

Firstpage

1

Lastpage

6

Abstract

Due to the large volume of the Web information and relatively high speed of information update, the coverage and quality of the retrieved pages by modern search engines is comparatively small. Given the volume of the Web and its frequency of content change, the coverage and quality of pages retrieved by modern search engines is relatively small since they crawl only hypertext links ignoring the search forms which are the entry points for accessing deep web content where two-thirds of information is resides. In this paper an algorithm has been designed to enable topical crawlers to access hidden web content by using domain based ontology to determine the forms´ relevance to the domain. In this work scientific research publications domain has been considered. Experimental results show that proposed approach is better as compared to keyword based crawlers in terms of both relevancy and completeness.

Keywords

Internet; information retrieval; ontologies (artificial intelligence); search engines; Web information; deep Web content access; domain based ontology; hypertext links; keyword based crawlers; ontology-based topical crawling algorithm; scientific research publications domain; search engines; topical crawlers; Arrays; Crawlers; Databases; HTML; Ontologies; Search engines; Web pages; Deep web; Domain ontology; Focused crawler; Form processing;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer and Communication Technology (ICCCT), 2012 Third International Conference on

Conference_Location

Allahabad

Print_ISBN

978-1-4673-3149-4

Type

conf

DOI

10.1109/ICCCT.2012.10

Filename

6394657