• DocumentCode
    1284366
  • Title

    Cost-Aware Rank Join with Random and Sorted Access

  • Author

    Martinenghi, D. ; Tagliasacchi, M.

  • Author_Institution
    Dipt. di Elettron. e Inf., Politec. di Milano, Milan, Italy
  • Volume
    24
  • Issue
    12
  • fYear
    2012
  • Firstpage
    2143
  • Lastpage
    2155
  • Abstract
    In this paper, we address the problem of joining ranked results produced by two or more services on the web. We consider services endowed with two kinds of access that are often available: 1) sorted access, which returns tuples sorted by score; 2) random access, which returns tuples matching a given join attribute value. Rank join operators combine objects of two or more relations and output the k combinations with the highest aggregate score. While the past literature has studied suitable bounding schemes for this setting, in this paper we focus on the definition of a pulling strategy, which determines the order of invocation of the joined services. We propose the Cost-Aware with Random and Sorted access (CARS) pulling strategy, which is derived at compile-time and is oblivious of the query-dependent score distributions. We cast CARS as the solution of an optimization problem based on a small set of parameters characterizing the joined services. We validate the proposed strategy with experiments on both real and synthetic data sets. We show that CARS outperforms prior proposals and that its overall access cost is always within a very short margin from that of an oracle-based optimal strategy. In addition, CARS is shown to be robust w.r.t. the uncertainty that may characterize the estimated parameters.
  • Keywords
    Web services; query processing; search engines; CARS pulling strategy; Web services; bounding schemes; cost-aware rank join; cost-aware with random and sorted access pulling strategy; join attribute value; optimization problem; oracle-based optimal strategy; parameter estimation; pulling strategy; query-dependent score distributions; rank join operators; real data sets; synthetic data sets; Context awareness; Information retrieval; Optimization; Relational databases; Search engines; Upper bound; Top-k; random access; rank join; sorted access;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2011.161
  • Filename
    5963672