Title :
STED: a system for topic enumeration and distillation
Author :
Greco, Gianluigi ; Greco, Sergio ; Zumpano, Ester
Author_Institution :
DEIS, Univ. della Calabria, Rende, Italy
Abstract :
Search services on hyperlinked data are becoming popular among users because of the huge amount of data available and the consequent difficulty of retrieving and filtering relevant documents. Traditional term-based search engines are not very useful for this purpose since the resulting ranking depends on the users´s precision in expressing the query. Current research, instead, takes a different approach, called topic distillation, which consists of finding documents related to the query topic, but these do not necessarily contain the query string. Current algorithms for topic distillation first compute a base set containing all the relevant pages and then apply an iterative procedure to obtain the authoritative pages. In this paper we present STED, a system for topic distillation and enumeration (i.e. identification of different communities) of Web documents. The system is based on a technique which computes authoritative pages by analyzing the structure of the base set. More specifically, the system applies a statistical approach to the co-citation matrix associated with the base set, to find the most co-cited pages and analyzes both the link structure and the content of pages. Several experiments have demonstrated the effectiveness and efficiency of the system.
Keywords :
citation analysis; information resources; information retrieval; STED; Web documents; co-citation matrix; co-cited pages; document filtering; document retrieval; hyperlinked data; iterative procedure; query string; ranking; search services; statistical approach; topic distillation; topic enumeration; Databases; Filtering; Information processing; Information resources; Information retrieval; Iterative algorithms; Search engines; Statistical analysis; Web sites; World Wide Web;
Conference_Titel :
Information Technology: Coding and Computing, 2002. Proceedings. International Conference on
Print_ISBN :
0-7695-1506-1
DOI :
10.1109/ITCC.2002.1000405