DocumentCode :
479003
Title :
Probability Model Based Hidden Databases Sampling Approach
Author :
Jian-Wei Tian ; Shi-Jun Li ; Qi Lu
Author_Institution :
Sch. of Comput., Wuhan Univ., Wuhan
fYear :
2008
fDate :
12-14 Oct. 2008
Firstpage :
1
Lastpage :
4
Abstract :
A great portion of data on the Web lies in the hidden databases of the deep Web. These databases can only be accessed through the query interfaces. Efficient and uniform data sampling approach is very important to other research work, for the data samples can give insight into the data quality, freshness and size information in the databases. However, the existing hidden database samplers are very inefficient, because lots of queries are wasted in the sampling walks. In this paper, we propose a probability model based sampling approach to solve this problem. First, we leverage the historical underflow walks to calculate the underflow probability of the attribute values. Based on the underflow probability, we give priority to execute the attribute values with largest underflow probability. The experimental results indicate that our approach can improve the sampling efficiency by detecting the underflow earlier and avoid many wasted queries.
Keywords :
database management systems; probability; query processing; attribute values; data quality; deep Web; hidden databases sampling; historical underflow; probability model; query interfaces; underflow probability; Data mining; Databases; Histograms; Probability; Query processing; Sampling methods; Search engines; Virtual manufacturing; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-2107-7
Electronic_ISBN :
978-1-4244-2108-4
Type :
conf
DOI :
10.1109/WiCom.2008.2575
Filename :
4680764
Link To Document :
بازگشت