DocumentCode :
1668036
Title :
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
Author :
Hai Nguyen ; Weber, Matthew S.
Author_Institution :
Dept. of Comput. Sci., Rutgers Univ., New Brunswick, NJ, USA
fYear :
2015
Firstpage :
724
Lastpage :
727
Abstract :
Web archiving provides social scientists and digital humanities researchers with a data source that enables the study of a wealth of historical phenomena. One of the most notable efforts to record the history of the World Wide Web is the Internet Archive (IA) project, which maintains the largest repository of archived data in the world. Understanding the quality of archived data and the completeness of each record of a single website is a central issue for scholarly research, and yet there is no standard record of the provenance of digital archives. Indeed, although present day records tend to be quite accurate, archived Web content deteriorates as one moves back in time. This paper analyzes a subset or archived Web data, measures the degree of degradation in a subset of data, and proposes statistical inference to such overcome limitations.
Keywords :
Internet; Web sites; information retrieval systems; records management; IA project; Internet archive project; Web archiving; Web site; World Wide Web; archival records; digital archives; historical phenomena; statistical inference; Big data; Data mining; Degradation; Internet; Libraries; Standards; Uniform resource locators; analytics; archival data; big data; research; statistical validity;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
Type :
conf
DOI :
10.1109/BigDataCongress.2015.118
Filename :
7207302
Link To Document :
بازگشت