DocumentCode
1338833
Title
Relevance-Based Retrieval on Hidden-Web Text Databases without Ranking Support
Author
Hristidis, Vagelis ; Hu, Yuheng ; Ipeirotis, Panagiotis G.
Author_Institution
Sch. of Comput. & Inf. Sci., Florida Int. Univ., Miami, FL, USA
Volume
23
Issue
10
fYear
2011
Firstpage
1555
Lastpage
1568
Abstract
Many online or local data sources provide powerful querying mechanisms but limited ranking capabilities. For instance, PubMed allows users to submit highly expressive Boolean keyword queries, but ranks the query results by date only. However, a user would typically prefer a ranking by relevance, measured by an information retrieval (IR) ranking function. A naive approach would be to submit a disjunctive query with all query keywords, retrieve all the returned matching documents, and then rerank them. Unfortunately, such an operation would be very expensive due to the large number of results returned by disjunctive queries. In this paper, we present algorithms that return the top results for a query, ranked according to an IR-style ranking function, while operating on top of a source with a Boolean query interface with no ranking capabilities (or a ranking capability of no interest to the end user). The algorithms generate a series of conjunctive queries that return only documents that are candidates for being highly ranked according to a relevance metric. Our approach can also be applied to other settings where the ranking is monotonic on a set of factors (query keywords in IR) and the source query interface is a Boolean expression of these factors. Our comprehensive experimental evaluation on the PubMed database and a TREC data set show that we achieve order of magnitude improvement compared to the current baseline approaches.
Keywords
Boolean functions; database management systems; pattern matching; query processing; relevance feedback; text analysis; Boolean keyword queries; Boolean query interface; PubMed; document matching; hidden-Web text databases; information retrieval ranking function; naive approach; query keywords; querying mechanisms; ranking support; relevance-based retrieval; Databases; Diabetes; Immune system; Mathematical model; Maximum likelihood estimation; Probabilistic logic; Hidden-web databases; keyword search; top-k ranking.;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2010.183
Filename
5590244
Link To Document