DocumentCode :
892834
Title :
From Wrapping to Knowledge
Author :
Arjona, José L. ; Corchuelo, Rafael ; Ruiz, David ; Toro, Miguel
Author_Institution :
Departamento de Electronica, Sistemas Informaticos y Automatica, Escuela Politecnica Superior, Huelva
Volume :
19
Issue :
2
fYear :
2007
Firstpage :
310
Lastpage :
323
Abstract :
One the most challenging problems for enterprise information integration is to deal with heterogeneous information sources on the Web. The reason is that they usually provide information that is in human-readable form only, which makes it difficult for a software agent to understand it. Current solutions build on the idea of annotating the information with semantics. If the information is unstructured, proposals such as S-CREAM, MnM, or Armadillo may be effective enough since they rely on using natural language processing techniques; furthermore, their accuracy can be improved by using redundant information on the Web, as C-PANKOW has proved recently. If the information is structured and closely related to a back-end database, deep annotation ranges among the most effective proposals, but it requires the information providers to modify their applications; if deep annotation is not applicable, the easiest solution consists of using a wrapper and transforming its output into annotations. In this paper, we prove that this transformation can be automated by means of an efficient, domain-independent algorithm. To the best of our knowledge, this is the first attempt to devise and formalize such a systematic, general solution
Keywords :
Internet; business data processing; natural language processing; software agents; text analysis; C-PANKOW; S-CREAM; World Wide Web; back-end database; deep annotation; domain-independent algorithm; enterprise information integration; human-readable form only; information sources; natural language processing techniques; redundant information; semiautomatic annotation; software agent; wrappers; Application software; Data mining; Databases; Information resources; Natural language processing; Proposals; Semantic Web; Software agents; Web pages; Wrapping; Enterprise information integration; semiautomatic annotation.; wrappers;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2007.31
Filename :
4039292
Link To Document :
بازگشت