Title :
Tools and techniques for harvesting the World Wide Web
Author :
Marill, Jennifer L. ; Boyko, Andrew ; Ashenfelder, Michael ; Graham, Laura
Author_Institution :
Office of Strategic Initiatives, Libr. of Congress, Washington, DC, USA
Abstract :
Recently the Library of Congress began developing a strategy for the preservation of digital content. Efforts have focused on the need to select, harvest, describe, access and preserve Web resources. This poster focuses on the Library´s initial investigation and evaluation of Web harvesting software tools.
Keywords :
Internet; content management; digital libraries; information retrieval systems; software tools; Library of Congress; Web archiving; Web harvesting software tool; World Wide Web; digital content preservation; Benchmark testing; Crawlers; Internet; Large-scale systems; Linux; Relational databases; Scalability; Software libraries; Software tools; Web sites;
Conference_Titel :
Digital Libraries, 2004. Proceedings of the 2004 Joint ACM/IEEE Conference on
Print_ISBN :
1-58113-832-6
DOI :
10.1109/JCDL.2004.1336207