• DocumentCode
    1668036
  • Title

    Internet Archives as a Tool for Research: Decay in Large Scale Archival Records

  • Author

    Hai Nguyen ; Weber, Matthew S.

  • Author_Institution
    Dept. of Comput. Sci., Rutgers Univ., New Brunswick, NJ, USA
  • fYear
    2015
  • Firstpage
    724
  • Lastpage
    727
  • Abstract
    Web archiving provides social scientists and digital humanities researchers with a data source that enables the study of a wealth of historical phenomena. One of the most notable efforts to record the history of the World Wide Web is the Internet Archive (IA) project, which maintains the largest repository of archived data in the world. Understanding the quality of archived data and the completeness of each record of a single website is a central issue for scholarly research, and yet there is no standard record of the provenance of digital archives. Indeed, although present day records tend to be quite accurate, archived Web content deteriorates as one moves back in time. This paper analyzes a subset or archived Web data, measures the degree of degradation in a subset of data, and proposes statistical inference to such overcome limitations.
  • Keywords
    Internet; Web sites; information retrieval systems; records management; IA project; Internet archive project; Web archiving; Web site; World Wide Web; archival records; digital archives; historical phenomena; statistical inference; Big data; Data mining; Degradation; Internet; Libraries; Standards; Uniform resource locators; analytics; archival data; big data; research; statistical validity;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2015 IEEE International Congress on
  • Conference_Location
    New York, NY
  • Print_ISBN
    978-1-4673-7277-0
  • Type

    conf

  • DOI
    10.1109/BigDataCongress.2015.118
  • Filename
    7207302