DocumentCode
710108
Title
Ranking Candidate Networks of relations to improve keyword search over relational databases
Author
de Oliveira, Pericles ; da Silva, Altigran ; de Moura, Edleno
Author_Institution
Inst. de Comput., Univ. Fed. do Amazonas, Manaus, Brazil
fYear
2015
fDate
13-17 April 2015
Firstpage
399
Lastpage
410
Abstract
Relational keyword search (R-KwS) systems based on schema graphs take the keywords from the input query, find the tuples and tables where these keywords occur and look for ways to “connect” these keywords using information on referential integrity constraints, i.e., key/foreign key pairs. The result is a number of expressions, called Candidate Networks (CNs), which join relations where keywords occur in a meaningful way. These CNs are then evaluated, resulting in a number of join networks of tuples (JNTs) that are presented to the user as ranked answers to the query. As the number of CNs is potentially very high, handling them is very demanding, both in terms of time and resources, so that, for certain queries, current systems may take too long to produce answers, and for others they may even fail to return results (e.g., by exhausting memory). Moreover, the quality of the CN evaluation may be compromised when a large number of CNs is processed. Based on observations made by other researchers and in our own findings on representative workloads, we argue that, although the number of possible Candidate Networks can be very high, only very few of them produce answers relevant to the user and are indeed worth processing. Thus, R-KwS systems can greatly benefit from methods for accessing the relevance of Candidate Networks, so that only those deemed relevant might be evaluated. We propose in this paper an approach for ranking CNs, based on their probability of producing relevant answers to the user. This relevance is estimated based on the current state of the underlying database using a probabilistic Bayesian model we have developed. Experiments that we performed indicate that this model is able to assign the relevant CNs among the top-4 in the ranking produced. In these experiments we also observed that processing only a few relevant CNs has a considerable positive impact, not only on the performance of processing keyword queries, but also on the quali- y of the results obtained.
Keywords
Bayes methods; graph theory; probability; query processing; relational databases; CN evaluation; JNTs; R-KwS systems; candidate network ranking; input query; join networks of tuples; keyword query processing; probabilistic Bayesian model; referential integrity constraints; relational databases; relational keyword search system; schema graphs; Algebra; Bayes methods; Indexes; Joints; Probabilistic logic; Relational databases;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location
Seoul
Type
conf
DOI
10.1109/ICDE.2015.7113301
Filename
7113301
Link To Document