DocumentCode :
3374494
Title :
A new study on using HTML structures to improve retrieval
Author :
Cutler, M. ; Deng, H. ; Maniccam, S.S. ; Meng, W.
Author_Institution :
Dept. of Comput. Sci., State Univ. of New York, Binghamton, NY, USA
fYear :
1999
fDate :
1999
Firstpage :
406
Lastpage :
409
Abstract :
Locating useful information effectively form the World Wide Web (WWW) is of wide interest. This paper presents new results on a methodology of using the structures and hyperlinks of HTML documents to improve the effectiveness of retrieving HTML documents. This methodology partitions the occurrences of terms in a document collection into classes according to the tags in which a particular term appears (such as Title, H1-H6, and Anchor). The rationale is that terms appearing in different structures of a document may have different significance in identifying the document. The weighting schemes of traditional information retrieval were extended to include class importance values. We implemented a genetic algorithm to determine a “best so far” class importance factor combination. Our experiments indicate that using this technique the retrieval effectiveness can be improved by 39.6% or higher
Keywords :
genetic algorithms; hypermedia markup languages; information resources; information retrieval; query processing; HTML structures; World Wide Web; class importance values; document collection; genetic algorithm; hyperlinks; retrieval; retrieval effectiveness; retrieving HTML documents; Databases; Electronic switching systems; Frequency; Genetics; HTML; Indexes; Uniform resource locators; Web pages; Web sites; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence, 1999. Proceedings. 11th IEEE International Conference on
Conference_Location :
Chicago, IL
ISSN :
1082-3409
Print_ISBN :
0-7695-0456-6
Type :
conf
DOI :
10.1109/TAI.1999.809831
Filename :
809831
Link To Document :
بازگشت