Title :
Keyphrase extraction-based query expansion in digital libraries
Author :
Song, Min ; Song, II-Yeol ; Allen, Robert B. ; Obradovic, Zoran
Author_Institution :
Center for Inf. Sci. & Technol., Temple Univ., Philadelphia, PA
Abstract :
In pseudo-relevance feedback, the two key factors affecting the retrieval performance most are the source from which expansion terms are generated and the method of ranking those expansion terms. In this paper, we present a novel unsupervised query expansion technique that utilizes keyphrases and POS phrase categorization. The keyphrases are extracted from the retrieved documents and weighted with an algorithm based on information gain and co-occurrence of phrases. The selected keyphrases are translated into disjunctive normal form (DNF) based on the POS phrase categorization technique for better query refomulation. Furthermore, we study whether ontologies such as WordNet and MeSH improve the retrieval performance in conjunction with the keyphrases. We test our techniques on TREC 5, 6, and 7 as well as a MEDLINE collection. The experimental results show that the use of keyphrases with POS phrase categorization produces the best average precision
Keywords :
document handling; ontologies (artificial intelligence); query formulation; query processing; relevance feedback; unsupervised learning; MEDLINE collection; POS phrase categorization; digital libraries; disjunctive normal form; documents retrieval; keyphrase extraction; ontologies; pseudorelevance feedback; query refomulation; retrieval performance; unsupervised query expansion technique; Data mining; Educational institutions; Feedback; Information retrieval; Information science; Ontologies; Proteins; Software libraries; Speech; Testing; POS; WordNet; information gain; keyphrase extraction; query expansion;
Conference_Titel :
Digital Libraries, 2006. JCDL '06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on
Conference_Location :
Chapel Hill, NC
Print_ISBN :
1-59593-354-9
DOI :
10.1145/1141753.1141800