DocumentCode
1668036
Title
Internet Archives as a Tool for Research: Decay in Large Scale Archival Records
Author
Hai Nguyen ; Weber, Matthew S.
Author_Institution
Dept. of Comput. Sci., Rutgers Univ., New Brunswick, NJ, USA
fYear
2015
Firstpage
724
Lastpage
727
Abstract
Web archiving provides social scientists and digital humanities researchers with a data source that enables the study of a wealth of historical phenomena. One of the most notable efforts to record the history of the World Wide Web is the Internet Archive (IA) project, which maintains the largest repository of archived data in the world. Understanding the quality of archived data and the completeness of each record of a single website is a central issue for scholarly research, and yet there is no standard record of the provenance of digital archives. Indeed, although present day records tend to be quite accurate, archived Web content deteriorates as one moves back in time. This paper analyzes a subset or archived Web data, measures the degree of degradation in a subset of data, and proposes statistical inference to such overcome limitations.
Keywords
Internet; Web sites; information retrieval systems; records management; IA project; Internet archive project; Web archiving; Web site; World Wide Web; archival records; digital archives; historical phenomena; statistical inference; Big data; Data mining; Degradation; Internet; Libraries; Standards; Uniform resource locators; analytics; archival data; big data; research; statistical validity;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location
New York, NY
Print_ISBN
978-1-4673-7277-0
Type
conf
DOI
10.1109/BigDataCongress.2015.118
Filename
7207302
Link To Document