DocumentCode
387565
Title
Passage retrieval on Web data
Author
Song, Rui-Hua ; Shao-Ping Ma ; Zhang, Min
Author_Institution
State Key Lab. of Intelligent Tech. & Syst., Tsinghua Univ., Beijing, China
Volume
3
fYear
2002
fDate
2002
Firstpage
1437
Abstract
On the Web, it is quite common that one document has several independent subtopics, i.e., it is multi-topic. For such document, dividing it into passages with each of them corresponding to only one topic will improve the retrieval performance. In this paper, the features embedded in the HTML structure are utilized as evidence of passage segmentation. Experimental results on the TREC-9 10 gigabyte Web dataset show that the 11-point average precision of the passage retrieval is higher than that of the usual document retrieval by about 9% on the collection of multi-topic documents and by about 1.6% on the whole document set. Further analyses indicate that the precision is actually higher, if all the documents returned by passage retrieval are assessed.
Keywords
Internet; feature extraction; hypermedia markup languages; information retrieval; HTML structure; Web data; features selection; information retrieval; multiple topic document retrieval; passage retrieval; passage segmentation; HTML; Hidden Markov models; Information retrieval;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
Print_ISBN
0-7803-7508-4
Type
conf
DOI
10.1109/ICMLC.2002.1167444
Filename
1167444
Link To Document