DocumentCode
3322363
Title
A Sampling-Based Approach to Information Recovery
Author
Xie, Junyi ; Yang, Jun ; Chen, Yuguo ; Wang, Haixun ; Yu, Philip S.
Author_Institution
Oracle Corp., Redwood City, CA
fYear
2008
fDate
7-12 April 2008
Firstpage
476
Lastpage
485
Abstract
There has been a recent resurgence of interest in research on noisy and incomplete data. Many applications require information to be recovered from such data. Ideally, an approach for information recovery should have the following features. First, it should be able to incorporate prior knowledge about the data, even if such knowledge is in the form of complex distributions and constraints for which no close-form solutions exist. Second, it should be able to capture complex correlations and quantify the degree of uncertainty in the recovered data, and further support queries over such data. The database community has developed a number of approaches for information recovery, but none is general enough to offer all above features. To overcome the limitations, we take a significantly more general approach to information recovery based on sampling. We apply sequential importance sampling, a technique from statistics that works for complex distributions and dramatically outperforms naive sampling when data is constrained. We illustrate the generality and efficiency of this approach in two application scenarios: cleansing RFID data, and recovering information from published data that has been summarized and randomized for privacy.
Keywords
importance sampling; information retrieval; RFID data cleansing; complex distributions; information recovery; sequential importance sampling; Base stations; Books; Cities and towns; Computer science; Databases; Detectors; Publishing; Radiofrequency identification; Sampling methods; Statistics;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location
Cancun
Print_ISBN
978-1-4244-1836-7
Electronic_ISBN
978-1-4244-1837-4
Type
conf
DOI
10.1109/ICDE.2008.4497456
Filename
4497456
Link To Document