DocumentCode
618297
Title
An architecture for extracting information from hidden web databases using intelligent agent technology through reinforcement learning
Author
Singh, Lavneet ; Sharma, D.K.
Author_Institution
Dept. of CEA, GLA Univ., Mathura, India
fYear
2013
fDate
11-12 April 2013
Firstpage
292
Lastpage
297
Abstract
The web contains enormous amount of information. From that enormous information only small amount of that information is visible to users and a huge portion of the information is not visible to the users. This is because traditional search engines are not able to index or access all information. The information which can be retrieved by following hypertext links are accessed by such traditional search engines. The forms which are not accessed by traditional search engines include login or authorization process. Hidden web refers to that part of the web which is not accessed by traditional web crawlers. An important problem of retrieving desired and good quality of information from huge hidden web database is how to find out and identify the entry points of hidden web databases i.e., forms, in the Web. The traditional web crawlers may be unable to retrieve all information from deep web databases. Therefore it is the main cause of motivation for retrieving information from deep web. Issues and challenges related to the problem are also discussed. An architecture for accessing hidden web databases that uses an intelligent agent technology through reinforcement learning is proposed. The experimental results show that the reinforcement learning helps in overcoming existing problems and outperforms the existing hidden web crawlers in terms of precision and recall.
Keywords
distributed databases; hypermedia; information retrieval; learning (artificial intelligence); search engines; authorization process; deep Web; hidden World Wide Web database crawler; hypertext links; information extraction; information retrieving; intelligent agent technology; login process; reinforcement learning; search engines; Crawlers; Databases; Feature extraction; Intelligent agents; Learning (artificial intelligence); Search engines; Support vector machine classification; Hidden Web; Hidden Web Database; Hidden web crawling; Reinforcement learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Information & Communication Technologies (ICT), 2013 IEEE Conference on
Conference_Location
JeJu Island
Print_ISBN
978-1-4673-5759-3
Type
conf
DOI
10.1109/CICT.2013.6558108
Filename
6558108
Link To Document