• DocumentCode
    3322363
  • Title

    A Sampling-Based Approach to Information Recovery

  • Author

    Xie, Junyi ; Yang, Jun ; Chen, Yuguo ; Wang, Haixun ; Yu, Philip S.

  • Author_Institution
    Oracle Corp., Redwood City, CA
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    476
  • Lastpage
    485
  • Abstract
    There has been a recent resurgence of interest in research on noisy and incomplete data. Many applications require information to be recovered from such data. Ideally, an approach for information recovery should have the following features. First, it should be able to incorporate prior knowledge about the data, even if such knowledge is in the form of complex distributions and constraints for which no close-form solutions exist. Second, it should be able to capture complex correlations and quantify the degree of uncertainty in the recovered data, and further support queries over such data. The database community has developed a number of approaches for information recovery, but none is general enough to offer all above features. To overcome the limitations, we take a significantly more general approach to information recovery based on sampling. We apply sequential importance sampling, a technique from statistics that works for complex distributions and dramatically outperforms naive sampling when data is constrained. We illustrate the generality and efficiency of this approach in two application scenarios: cleansing RFID data, and recovering information from published data that has been summarized and randomized for privacy.
  • Keywords
    importance sampling; information retrieval; RFID data cleansing; complex distributions; information recovery; sequential importance sampling; Base stations; Books; Cities and towns; Computer science; Databases; Detectors; Publishing; Radiofrequency identification; Sampling methods; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497456
  • Filename
    4497456