• DocumentCode
    387565
  • Title

    Passage retrieval on Web data

  • Author

    Song, Rui-Hua ; Shao-Ping Ma ; Zhang, Min

  • Author_Institution
    State Key Lab. of Intelligent Tech. & Syst., Tsinghua Univ., Beijing, China
  • Volume
    3
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    1437
  • Abstract
    On the Web, it is quite common that one document has several independent subtopics, i.e., it is multi-topic. For such document, dividing it into passages with each of them corresponding to only one topic will improve the retrieval performance. In this paper, the features embedded in the HTML structure are utilized as evidence of passage segmentation. Experimental results on the TREC-9 10 gigabyte Web dataset show that the 11-point average precision of the passage retrieval is higher than that of the usual document retrieval by about 9% on the collection of multi-topic documents and by about 1.6% on the whole document set. Further analyses indicate that the precision is actually higher, if all the documents returned by passage retrieval are assessed.
  • Keywords
    Internet; feature extraction; hypermedia markup languages; information retrieval; HTML structure; Web data; features selection; information retrieval; multiple topic document retrieval; passage retrieval; passage segmentation; HTML; Hidden Markov models; Information retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2002. Proceedings. 2002 International Conference on
  • Print_ISBN
    0-7803-7508-4
  • Type

    conf

  • DOI
    10.1109/ICMLC.2002.1167444
  • Filename
    1167444