Title :
Extraction and analysis of referenced web links in large-scale scholarly articles
Author :
Ke Zhou ; Tobin, Richard ; Grover, Claire
Author_Institution :
Univ. of Edinburgh, Edinburgh, UK
Abstract :
In this paper we report on a sub-task undertaken as part of Hiberlink, a project which is examining the phenomenon of reference rot within scholarly works. In our sub-task we aim to quantify and understand the nature of occurrence of links to web resources referenced from papers in very large-scale scholarly collections. We first introduce the challenges involved in extracting links from scholarly articles and develop and evaluate the accuracy of a set of link extraction systems. Secondly, five collections containing millions of scholarly articles with different characteristics (across different disciplines, time periods and publication types) are studied and we demonstrate that web resources are widely cited in scholarly publications and should be an important concern for digital preservation.
Keywords :
Internet; electronic publishing; information analysis; Hiberlink; Web resources; digital preservation; large-scale scholarly articles; reference rot phenomenon; referenced Web links analysis; referenced Web links extraction; scholarly publications; scholarly works; very large-scale scholarly collections; Accuracy; Educational institutions; Libraries; Portable document format; Prototypes; Uniform resource locators; XML; Digital Preservation; Link Extraction; Scholarly Data;
Conference_Titel :
Digital Libraries (JCDL), 2014 IEEE/ACM Joint Conference on
Conference_Location :
London
DOI :
10.1109/JCDL.2014.6970220