DocumentCode :
3211405
Title :
Applying site information to information retrieval from the Web
Author :
Asano, Yasuhito ; Imai, Hiroshi ; Toyoda, Masashi ; Kitsuregawa, Masaru
Author_Institution :
Graduate Sch. of Sci., Univ. of Tokyo, Japan
fYear :
2002
fDate :
12-14 Dec. 2002
Firstpage :
83
Lastpage :
92
Abstract :
In recent years, several information retrieval methods using information about Web-links have been developed, such as HITS and trawling. In order to analyze Web-links dividing into links inside each Web site (local-links) and links between Web sites (global-links)for information retrieval, a proper model of the Web site is required. In existing research, a Web server is used as a model of the Web site. This idea works relatively well when a Web site corresponds to a server, as is the case for public Web sites, but works poorly when multiple Web sites correspond to a server, as is the case for private Web sites on rental Web servers. We propose a new model of the Web site, "directory-based site", to handle typical private sites, and a method to identify them using information about the URL and Web-links. We verify the method can approximately identify, at a rate of 66% of over 110,000 servers, whether each server has multiple directory-based sites or not, and extract over 500,000 directory-based sites and 4 million global-links by computational experiments using jp-domain URLs and Web-link data contains over 23 million URLs and 100 million Web-links, collected from July to August 2000, by Toyoda and Kitsuregawa. We also propose a new framework of Web-link based information retrieval that uses directory-based sites and global-links instead of Web pages and whole Web-links respectively, and examine the effectiveness of our framework by comparing a result of trawling on our framework to one on the existing framework.
Keywords :
Web sites; information retrieval; HITS; URL; Web links; Web server; Web sites; directory based site; global links; information retrieval methods; jp-domain URLs; local links; private sites; trawling; Gold; Information analysis; Information retrieval; Information science; Search engines; Search methods; Toy industry; Uniform resource locators; Web pages; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems Engineering, 2002. WISE 2002. Proceedings of the Third International Conference on
Print_ISBN :
0-7695-1766-8
Type :
conf
DOI :
10.1109/WISE.2002.1181646
Filename :
1181646
Link To Document :
بازگشت