• DocumentCode
    3268750
  • Title

    A probabilistic approach to metasearching with adaptive probing

  • Author

    Liu, Zhenyu ; Luo, Chang ; Cho, Junghoo ; Chu, Wesley W.

  • Author_Institution
    Dept. of Comput. Sci., California Univ., Los Angeles, CA, USA
  • fYear
    2004
  • fDate
    30 March-2 April 2004
  • Firstpage
    547
  • Lastpage
    558
  • Abstract
    An ever-increasing amount of valuable information is stored in Web databases, "hidden" behind search interfaces. To save the user\´s effort in manually exploring each database, metasearchers automatically select the most relevant databases to a user\´s query. In this paper, we focus on one of the technical challenges in metasearching, namely database selection. Past research uses a precollected summary of each database to estimate its "relevancy" to the query, and in many cases make incorrect database selection. In this paper, we propose two techniques: probabilistic relevancy modelling and adaptive probing. First, we model the relevancy of each database to a given query as a probabilistic distribution, derived by sampling that database. Using the probabilistic model, the user can explicitly specify a desired level of certainty for database selection. The adaptive probing technique decides which and how many databases to contact in order to satisfy the user\´s requirement. Our experiments on real hidden-Web databases indicate that our approach significantly improves the accuracy of database selection at the cost of a small number of database probing.
  • Keywords
    Internet; distributed databases; meta data; probability; query processing; Web databases; adaptive probing technique; database probing; hidden-Web databases; metasearchers; probabilistic relevancy modelling; Computer science; Costs; Databases; Internet; Merging; Sampling methods; Search engines; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2004. Proceedings. 20th International Conference on
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-2065-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2004.1320026
  • Filename
    1320026