DocumentCode :
618297
Title :
An architecture for extracting information from hidden web databases using intelligent agent technology through reinforcement learning
Author :
Singh, Lavneet ; Sharma, D.K.
Author_Institution :
Dept. of CEA, GLA Univ., Mathura, India
fYear :
2013
fDate :
11-12 April 2013
Firstpage :
292
Lastpage :
297
Abstract :
The web contains enormous amount of information. From that enormous information only small amount of that information is visible to users and a huge portion of the information is not visible to the users. This is because traditional search engines are not able to index or access all information. The information which can be retrieved by following hypertext links are accessed by such traditional search engines. The forms which are not accessed by traditional search engines include login or authorization process. Hidden web refers to that part of the web which is not accessed by traditional web crawlers. An important problem of retrieving desired and good quality of information from huge hidden web database is how to find out and identify the entry points of hidden web databases i.e., forms, in the Web. The traditional web crawlers may be unable to retrieve all information from deep web databases. Therefore it is the main cause of motivation for retrieving information from deep web. Issues and challenges related to the problem are also discussed. An architecture for accessing hidden web databases that uses an intelligent agent technology through reinforcement learning is proposed. The experimental results show that the reinforcement learning helps in overcoming existing problems and outperforms the existing hidden web crawlers in terms of precision and recall.
Keywords :
distributed databases; hypermedia; information retrieval; learning (artificial intelligence); search engines; authorization process; deep Web; hidden World Wide Web database crawler; hypertext links; information extraction; information retrieving; intelligent agent technology; login process; reinforcement learning; search engines; Crawlers; Databases; Feature extraction; Intelligent agents; Learning (artificial intelligence); Search engines; Support vector machine classification; Hidden Web; Hidden Web Database; Hidden web crawling; Reinforcement learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information & Communication Technologies (ICT), 2013 IEEE Conference on
Conference_Location :
JeJu Island
Print_ISBN :
978-1-4673-5759-3
Type :
conf
DOI :
10.1109/CICT.2013.6558108
Filename :
6558108
Link To Document :
بازگشت