DocumentCode
479003
Title
Probability Model Based Hidden Databases Sampling Approach
Author
Jian-Wei Tian ; Shi-Jun Li ; Qi Lu
Author_Institution
Sch. of Comput., Wuhan Univ., Wuhan
fYear
2008
fDate
12-14 Oct. 2008
Firstpage
1
Lastpage
4
Abstract
A great portion of data on the Web lies in the hidden databases of the deep Web. These databases can only be accessed through the query interfaces. Efficient and uniform data sampling approach is very important to other research work, for the data samples can give insight into the data quality, freshness and size information in the databases. However, the existing hidden database samplers are very inefficient, because lots of queries are wasted in the sampling walks. In this paper, we propose a probability model based sampling approach to solve this problem. First, we leverage the historical underflow walks to calculate the underflow probability of the attribute values. Based on the underflow probability, we give priority to execute the attribute values with largest underflow probability. The experimental results indicate that our approach can improve the sampling efficiency by detecting the underflow earlier and avoid many wasted queries.
Keywords
database management systems; probability; query processing; attribute values; data quality; deep Web; hidden databases sampling; historical underflow; probability model; query interfaces; underflow probability; Data mining; Databases; Histograms; Probability; Query processing; Sampling methods; Search engines; Virtual manufacturing; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Wireless Communications, Networking and Mobile Computing, 2008. WiCOM '08. 4th International Conference on
Conference_Location
Dalian
Print_ISBN
978-1-4244-2107-7
Electronic_ISBN
978-1-4244-2108-4
Type
conf
DOI
10.1109/WiCom.2008.2575
Filename
4680764
Link To Document