Title :
PageChaser: A Tool for the Automatic Correction of Broken Web Links
Author :
Morishima, Atsuyuki ; Nakamizo, Akiyoshi ; Iida, Tomoharu ; Sugimoto, Shigeo ; Kitagawa, Hiroyuki
Author_Institution :
Univ. of Tsukuba, Tsukuba
Abstract :
PageChaser is a system that monitors links between Web pages and searches for the new locations of moved Web pages when it finds broken links. The problem of searching for moved pages is different from typical information retrieval problems. First, it is impossible to identify the final destination until the page is actually moved, so the index-server approach is not necessarily effective. Secondly, there is a large bias about where the new address is likely to be and crawler-based solutions can be effectively implemented, avoiding the need to search the entire Web. PageChaser incorporates a comprehensive set of heuristics, some of which are novel, in a single unified framework. This paper explains the underlying ideas behind the design and development of PageChaser.
Keywords :
Internet; information retrieval; search engines; PageChaser; Web pages; crawler-based solution; index-server approach; information retrieval; search engine; Content management; Databases; Indexes; Information retrieval; Project management; Software development management; Software tools; Uniform resource locators; Web pages; World Wide Web;
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
DOI :
10.1109/ICDE.2008.4497598