Title :
An Increment-Based Random Walk Approach to Sampling Hidden Databases
Author :
Zhao, Na ; Li, Qingzhong ; Yan, Zhongmin
Author_Institution :
Sch. of Comput. Sci. & Technol, Shandong Univ. Ji´´nan, Ji´´nan
Abstract :
A flood of information is hidden behind form-like interface which makes it difficult to capture the characteristics of the databases, such as the topic and the frequency of updates. This poses a great challenge for hidden web data integration. HIDDEN-DB-SAMPLER is the first algorithm to address this problem, but it does not consider the keyword attributes on the query interface. This paper presents increment-based random walk, a new technique applicable to any kind of attributes. The main idea of this approach is for keyword attributes, it incrementally obtains new values from a database. That is, select a value from the current sample and submit it to the interface, the selection scheme is designed to ensure the quality of the sampling; for other attributes, it works as RANDOM WALK does. An extensive set of experimental results demonstrates the accuracy and efficiency of our technique.
Keywords :
Internet; database management systems; query processing; HIDDEN-DB-SAMPLER; form-like interface; hidden Web data integration; increment-based random walk; sampling hidden databases; Books; Computer science; Data structures; Databases; Frequency; Runtime; Sampling methods; Semantic Web; Software engineering; Hidden databases; increment-based; random walk; sampling;
Conference_Titel :
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3336-0
DOI :
10.1109/CSSE.2008.595