Ranking Candidate Networks of relations to improve keyword search over relational databases

Author

de Oliveira, Pericles ; da Silva, Altigran ; de Moura, Edleno

Author_Institution

Inst. de Comput., Univ. Fed. do Amazonas, Manaus, Brazil

fYear

2015

fDate

13-17 April 2015

Firstpage

399

Lastpage

410

Abstract

Relational keyword search (R-KwS) systems based on schema graphs take the keywords from the input query, find the tuples and tables where these keywords occur and look for ways to “connect” these keywords using information on referential integrity constraints, i.e., key/foreign key pairs. The result is a number of expressions, called Candidate Networks (CNs), which join relations where keywords occur in a meaningful way. These CNs are then evaluated, resulting in a number of join networks of tuples (JNTs) that are presented to the user as ranked answers to the query. As the number of CNs is potentially very high, handling them is very demanding, both in terms of time and resources, so that, for certain queries, current systems may take too long to produce answers, and for others they may even fail to return results (e.g., by exhausting memory). Moreover, the quality of the CN evaluation may be compromised when a large number of CNs is processed. Based on observations made by other researchers and in our own findings on representative workloads, we argue that, although the number of possible Candidate Networks can be very high, only very few of them produce answers relevant to the user and are indeed worth processing. Thus, R-KwS systems can greatly benefit from methods for accessing the relevance of Candidate Networks, so that only those deemed relevant might be evaluated. We propose in this paper an approach for ranking CNs, based on their probability of producing relevant answers to the user. This relevance is estimated based on the current state of the underlying database using a probabilistic Bayesian model we have developed. Experiments that we performed indicate that this model is able to assign the relevant CNs among the top-4 in the ranking produced. In these experiments we also observed that processing only a few relevant CNs has a considerable positive impact, not only on the performance of processing keyword queries, but also on the quali- y of the results obtained.

Keywords

Bayes methods; graph theory; probability; query processing; relational databases; CN evaluation; JNTs; R-KwS systems; candidate network ranking; input query; join networks of tuples; keyword query processing; probabilistic Bayesian model; referential integrity constraints; relational databases; relational keyword search system; schema graphs; Algebra; Bayes methods; Indexes; Joints; Probabilistic logic; Relational databases;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Engineering (ICDE), 2015 IEEE 31st International Conference on

Conference_Location

Seoul

Type

conf

DOI

10.1109/ICDE.2015.7113301

Filename

7113301

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=710108