Title of article :
Lightweight integration of IR and DB for scalable hybrid search with integrated ranking support
Author/Authors :
Wang، نويسنده , , Haofen and Tran، نويسنده , , Thanh and Liu، نويسنده , , Chang and Fu، نويسنده , , Linyun، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2012
Abstract :
The Web contains a large amount of documents and an increasing quantity of structured data in the form of RDF triples. Many of these triples are annotations associated with documents. While structured queries constitute the principal means to retrieve structured data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these formalisms to query both textual and structured data can address more complex information needs. However, hybrid search on the large scale Web environment faces several challenges. First, there is a need for repositories that can store and index a large amount of semantic data as well as textual data in documents, and manage them in an integrated way. Second, methods for hybrid query answering are needed to exploit the data from such an integrated repository. These methods should be fast and scalable, and in particular, they shall support flexible ranking schemes to return not all but only the most relevant results. In this paper, we present CE2, an integrated solution that leverages mature information retrieval and database technologies to support large scale hybrid search. For scalable and integrated management of data, CE2 integrates off-the-shelf database solutions with inverted indexes. Efficient hybrid query processing is supported through novel data structures and algorithms which allow advanced ranking schemes to be tightly integrated. Furthermore, a concrete ranking scheme is proposed to take features from both textual and structured data into account. Experiments conducted on DBpedia and Wikipedia show that CE2 can provide good performance in terms of both effectiveness and efficiency.
Keywords :
Hybrid search , Scalable query processing , Inverted index , Ranking , IR and DB integration
Journal title :
Web Semantics Science,Services and Agents on the World Wide Web
Journal title :
Web Semantics Science,Services and Agents on the World Wide Web