DocumentCode :
1670996
Title :
Extending vector space model for XML ranking
Author :
He, Weimin ; Lv, Teng
Author_Institution :
Dept. of Comput. & New Media Technol., Univ. of Wisconsin-Stevens Point, Stevens Point, WI, USA
fYear :
2011
Firstpage :
118
Lastpage :
123
Abstract :
There is an increasing interest in recent years for querying and ranking XML documents. In this paper, we present a new framework for querying and ranking schema-less XML documents based on concise summaries of their structural and textual content. We introduce a novel data synopsis structure to summarize the textual content of an XML document for efficient indexing. More importantly, we extend the traditional vector space model to effectively rank XML documents over the proposed data synopses. We conduct extensive experiments over XML benchmark data to demonstrate the advantages of the indexing scheme and the effectiveness of our ranking scheme. We also compare our framework with Lucene to demonstrate our extended TF*IDF scoring function is effective.
Keywords :
XML; indexing; query processing; text analysis; TF*IDF scoring function; XML benchmark data; data synopsis structure; indexing; schema-less XML document querying; schema-less XML document ranking; structural content; textual content; vector space model; Benchmark testing; Bicycles; Matched filters; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Applications of Digital Information and Web Technologies (ICADIWT), 2011 Fourth International Conference on the
Conference_Location :
Stevens Point, WI
Print_ISBN :
978-1-4244-9824-6
Type :
conf
DOI :
10.1109/ICADIWT.2011.6041404
Filename :
6041404
Link To Document :
بازگشت