Title :
Simulation Study of Language Specific Web Crawling
Author :
Somboonviwat, Kulwadee ; Kitsuregawa, Masaru ; Tamura, Takayuki
Author_Institution :
Institute of Industrial Science, University of Tokyo
Abstract :
The Web has been recognized as an important part of our cultural heritage. Many nations started archiving national web spaces for future generations. A key technology for data acquisition employed by these archiving projects is web crawling. Crawling cultural and/or linguistic specific resources from the borderless Web raises many challenging issues. In this paper, we propose the language specific web crawling and evaluate the language specific crawling strategies on the web crawling simulator.
Keywords :
Crawlers; Cultural differences; Data acquisition; Information technology; Research and development; Service oriented architecture; Space technology; Uniform resource locators; Web pages; Web sites;
Conference_Titel :
Data Engineering Workshops, 2005. 21st International Conference on
Print_ISBN :
0-7695-2657-8
DOI :
10.1109/ICDE.2005.282