مرکز منطقه ای اطلاع رساني علوم و فناوري - Extracting Output Metadata from Scientific Deep Web Data Sources

DocumentCode :

2771845

Title :

Extracting Output Metadata from Scientific Deep Web Data Sources

Author :

Wang, Fan ; Agrawal, Gagan

Author_Institution :

Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA

fYear :

2009

fDate :

6-9 Dec. 2009

Firstpage :

552

Lastpage :

561

Abstract :

Increasingly, many data sources appear as online databases, hidden behind query forms, thus forming the deep Web. The popularity of this new medium for data dissemination is leading to new problems in data integration. Particularly, to enable data integration from multiple deep Web data sources, one needs to obtain the metadata for each of the data sources. Obtaining the metadata, particularly, the output schema, can be very challenging. This is because, given an input query, many deep web data sources only return a subset of the output schema attributes, i.e, the ones that have a non-NULL value for the corresponding input. In this paper, we propose two approaches, which are the sampling model approach and the mixture model approach, respectively, to efficiently obtain an approximately complete set of output schema attributes from a deep Web data source. Our experiments show while each of the above two approaches has limitations, a hybrid strategy, where we combine the two approaches, achieves high recall with good precision for most data sources.

Keywords :

Internet; meta data; data dissemination; data integration; online databases; output metadata extraction; sampling model; scientific deep Web data sources; Computer science; Data engineering; Data mining; Databases; Documentation; HTML; Humans; Sampling methods; USA Councils; Web pages; deep web; schema extraction;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Mining, 2009. ICDM '09. Ninth IEEE International Conference on

Conference_Location :

Miami, FL

ISSN :

1550-4786

Print_ISBN :

978-1-4244-5242-2

Electronic_ISBN :

1550-4786

Type :

conf

DOI :

10.1109/ICDM.2009.41

Filename :

5360281

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2771845