An architecture for extracting information from hidden web databases using intelligent agent technology through reinforcement learning

Author

Singh, Lavneet ; Sharma, D.K.

Author_Institution

Dept. of CEA, GLA Univ., Mathura, India

fYear

2013

fDate

11-12 April 2013

Firstpage

292

Lastpage

297

Abstract

The web contains enormous amount of information. From that enormous information only small amount of that information is visible to users and a huge portion of the information is not visible to the users. This is because traditional search engines are not able to index or access all information. The information which can be retrieved by following hypertext links are accessed by such traditional search engines. The forms which are not accessed by traditional search engines include login or authorization process. Hidden web refers to that part of the web which is not accessed by traditional web crawlers. An important problem of retrieving desired and good quality of information from huge hidden web database is how to find out and identify the entry points of hidden web databases i.e., forms, in the Web. The traditional web crawlers may be unable to retrieve all information from deep web databases. Therefore it is the main cause of motivation for retrieving information from deep web. Issues and challenges related to the problem are also discussed. An architecture for accessing hidden web databases that uses an intelligent agent technology through reinforcement learning is proposed. The experimental results show that the reinforcement learning helps in overcoming existing problems and outperforms the existing hidden web crawlers in terms of precision and recall.

Keywords

distributed databases; hypermedia; information retrieval; learning (artificial intelligence); search engines; authorization process; deep Web; hidden World Wide Web database crawler; hypertext links; information extraction; information retrieving; intelligent agent technology; login process; reinforcement learning; search engines; Crawlers; Databases; Feature extraction; Intelligent agents; Learning (artificial intelligence); Search engines; Support vector machine classification; Hidden Web; Hidden Web Database; Hidden web crawling; Reinforcement learning;

fLanguage

English

Publisher

ieee

Conference_Titel

Information & Communication Technologies (ICT), 2013 IEEE Conference on

Conference_Location

JeJu Island

Print_ISBN

978-1-4673-5759-3

Type

conf

DOI

10.1109/CICT.2013.6558108

Filename

6558108