Title :
Topic Signature Language Models for Ad hoc Retrieval
Author :
Zhou, Xiaohua ; Hu, Xiaohua ; Zhang, Xiaodan
Author_Institution :
Drexel Univ., Philadelphia
Abstract :
Semantic smoothing, which incorporates synonym and sense information into the language models, is effective and potentially significant to improve retrieval performance. Previously implemented semantic smoothing models such as the translation model have shown good experimental results. However, these models are unable to incorporate contextual information. To overcome this limitation, we propose a novel context-sensitive semantic smoothing method that decomposes a document into a set of weighted context-sensitive topic signatures and then maps those topic signatures into query terms. The language model with such a context- sensitive semantic smoothing is referred to as the topic signature language model. In detail, we implement two types of topic signatures, depending on whether ontology exists in the application domain. One is the ontology-based concept and the other is the multiword phrase. The mapping probabilities from each topic signature to individual terms are estimated through the EM algorithm. Document models based on topic signature mapping are then derived. The new smoothing method is evaluated on the TREC 2004/ 2005 Genomics Track with ontology-based concepts, as well as the TREC Ad Hoc Track (Disks 1, 2, and 3) with multiword phrases. Both experiments show significant improvements over the two-stage language model, as well as the language model with context- insensitive semantic smoothing.
Keywords :
computational linguistics; document handling; expectation-maximisation algorithm; information retrieval; ontologies (artificial intelligence); EM algorithm; ad hoc retrieval; context-sensitive semantic smoothing; contextual information; document model; multiword phrase; ontology-based concept; query terms; sense information; synonym information; topic signature language model; topic signature mapping; translation model; weighted context-sensitive topic signatures; Background noise; Bioinformatics; Context modeling; Genomics; Helium; Information retrieval; Ontologies; Smoothing methods; Solid modeling; Training data; Concept; Information Retrieval; Language Models; Multiword Phrase; Semantic Smoothing; Topic Signature;
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
DOI :
10.1109/TKDE.2007.1058