DocumentCode :
1468515
Title :
Efficient Extended Boolean Retrieval
Author :
Pohl, Stefan ; Moffat, Alistair ; Zobel, Justin
Author_Institution :
Dept. of Comput. Sci. & Software Eng., Univ. of Melbourne, Melbourne, VIC, Australia
Volume :
24
Issue :
6
fYear :
2012
fDate :
6/1/2012 12:00:00 AM
Firstpage :
1014
Lastpage :
1024
Abstract :
Extended Boolean retrieval (EBR) models were proposed nearly three decades ago, but have had little practical impact, despite their significant advantages compared to either ranked keyword or pure Boolean retrieval. In particular, EBR models produce meaningful rankings; their query model allows the representation of complex concepts in an and-or format; and they are scrutable, in that the score assigned to a document depends solely on the content of that document, unaffected by any collection statistics or other external factors. These characteristics make EBR models attractive in domains typified by medical and legal searching, where the emphasis is on iterative development of reproducible complex queries of dozens or even hundreds of terms. However, EBR is much more computationally expensive than the alternatives. We consider the implementation of the p-norm approach to EBR, and demonstrate that ideas used in the max-score and wand exact optimization techniques for ranked keyword retrieval can be adapted to allow selective bypass of documents via a low-cost screening process for this and similar retrieval models. We also propose term-independent bounds that are able to further reduce the number of score calculations for short, simple queries under the extended Boolean retrieval model. Together, these methods yield an overall saving from 50 to 80 percent of the evaluation cost on test queries drawn from biomedical search.
Keywords :
Boolean functions; document handling; information retrieval; statistical analysis; EBR; complex concepts; document content; efficient extended Boolean retrieval; keyword retrieval; legal searching; medical searching; optimization techniques; statistics collection; Biological system modeling; Computational modeling; Law; Optimization; Query processing; Systematics; Document-at-a-time; efficiency; extended Boolean retrieval; p-norm; query processing.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2011.63
Filename :
5728812
Link To Document :
بازگشت