DocumentCode :
3200027
Title :
Web services for information extraction from the Web
Author :
Habegger, Benjamin ; Quafafou, Mohamed
Author_Institution :
Lab. d´´Informatique de Nantes Atlantique, Nantes Univ., France
fYear :
2004
fDate :
6-9 July 2004
Firstpage :
279
Lastpage :
286
Abstract :
Extracting information from the Web is a complex task with different components which can either be generic or specific to the task, going from downloading a given page, following links, querying a Web-based applications via an HTML form and the HTTP protocol, querying a Web service via the SOAP protocol, etc. Therefore building Web services which proceed to executing an information tasks can not be simply hard coded (i.e. written and compiled once and for all in a given programming language). In order to be able to build flexible information extraction Web Services we need to be able to compose different sub tasks together. We propose a, XML-based language to describe information extraction Web services as the compositions of existing Web services and specific functions. The usefulness the proposed framework is demonstrated by three real world applications. (1) Search engines: we show how to describe a task which queries Google´s Web service, retrieves more information on the results by querying their respective HTTP servers, and filters them according to this information. (2) E-commerce sites : an information extraction Web service giving access to an existing HTML-based e-commerce online application such as Amazon is built. (3) Patent extraction: a last example shows how to describe an information extraction Web service which allows to query a Web-based application, extract the set of result links, follow them, and extract the needed information on the result pages. In all three applications the generated description can be easily modified and completed to further respond the user´s needs and create value-added Web services.
Keywords :
Web sites; XML; electronic commerce; information filters; information retrieval; knowledge acquisition; search engines; Amazon; Google Web service; HTML; HTML-based e-commerce online application; HTTP protocol; HTTP servers; SOAP protocol; Web information extraction; Web links; Web page downloading; Web service querying; Web-based applications; XML-based language; e-commerce sites; information filtering; information retrieval; patent extraction; search engines; value-added Web services; Computer languages; Data mining; HTML; Information filtering; Information filters; Information retrieval; Search engines; Simple object access protocol; Web server; Web services;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Services, 2004. Proceedings. IEEE International Conference on
Print_ISBN :
0-7695-2167-3
Type :
conf
DOI :
10.1109/ICWS.2004.1314749
Filename :
1314749
Link To Document :
بازگشت