DocumentCode :
2131511
Title :
Exploiting Data Semantics to Discover, Extract, and Model Web Sources
Author :
Ambite, José Luis ; Knoblock, Craig A. ; Lerman, Kristina ; Plangprasopchok, Anon ; Russ, Thomas ; Gazen, Cenk ; Minton, Steven ; Carman, Mark
Author_Institution :
Inf. Sci. Inst., USC, Marina, CA
fYear :
2008
fDate :
15-19 Dec. 2008
Firstpage :
771
Lastpage :
779
Abstract :
We describe Deimos, a system that automatically discovers and models new sources of information.The system exploits four core technologies developed by our group that makes an end-to-end solution to this problem possible. First, given an example source, Deimos finds other similar sources online. Second, it invokes and extracts data from these sources. Third, given the syntactic structure of a source, Deimos maps its inputs and outputs to semantic types. Finally, it infers the source´s semantic definition, i.e., the function that maps the inputs to the outputs. Deimos is able to successfully automate these steps by exploiting a combination of background knowledge and data semantics. We describe the challenges in integrating separate components into a unified approach to discovering, extracting and modeling new online sources. We provide an end-to-end validation of the system in two information domains to show that it can successfully discover and model new data sources in those domains.
Keywords :
Internet; data mining; Deimos; Web source discovery; data extraction; data semantics; end-to-end validation; Conferences; Data mining; HTML; Informatics; Information resources; Labeling; Marine technology; USA Councils; Web services; data extraction; schema modeling; semantic interoperability; semantic modeling; source modeling; tagging;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-0-7695-3503-6
Electronic_ISBN :
978-0-7695-3503-6
Type :
conf
DOI :
10.1109/ICDMW.2008.134
Filename :
4734005
Link To Document :
بازگشت