DocumentCode :
2987103
Title :
Combining agents and Wrapper Induction for information gathering on restricted web domains
Author :
Albitar, S. ; Espinasse, Bernard ; Fournier, Sébastien
Author_Institution :
LSIS, Univ. d´´Aix-Marseille, Marseille, France
fYear :
2010
fDate :
19-21 May 2010
Firstpage :
343
Lastpage :
352
Abstract :
Web is growing constantly and exponentially every day. Thus, gathering relevant information becomes unfeasible. Existent indexing-based search engines ignore information context, which is essential to deciding on its relevance. Restraining to a single web domain, domain ontology can be used to take into consideration the related context, the fact that might enable treating web pages that belong to the considered domain more intelligently. Nevertheless, symbolic rules that exploit domain´s ontology to realize this treatment are delicate and fastidious to develop, especially for information extraction task. This paper presents Boosted Wrapper Induction (BWI), a machine learning method for adaptive information extraction, and its exploitation as a replacement of the symbolic approach for information extraction task in AGATHE, a generic multi-agent architecture for information gathering on restrained web domains.
Keywords :
Computer architecture; Data mining; Large scale integration; Learning systems; Machine learning; Machine learning algorithms; Ontologies; Production; Service oriented architecture; Web pages; Information extraction; boosted wrapper induction; component; cooperative information gathering; machine learning; multi-agent systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Research Challenges in Information Science (RCIS), 2010 Fourth International Conference on
Conference_Location :
Nice, France
ISSN :
2151-1349
Print_ISBN :
978-1-4244-4839-5
Electronic_ISBN :
2151-1349
Type :
conf
DOI :
10.1109/RCIS.2010.5507394
Filename :
5507394
Link To Document :
بازگشت