Title :
A Strategy for Automatically Extracting References from PDF Documents
Author :
Alves, Neide Ferreira ; Lins, Rafael Dueire ; Lencastre, Maria
Author_Institution :
Univ. do Estado do Amazonas, Manaus, Brazil
Abstract :
Every day the number of citations an author receives is becoming more important than the size of his list of publications. The automatic extraction of bibliographic references in scientific articles is still a difficult problem in Document Engineering, even if the document is originally in digital form. This paper presents a strategy for extracting references of scientific documents in PDF format. The scheme proposed was validated in Live Memory platform, developed to generate digital libraries of proceedings of technical events.
Keywords :
bibliographic systems; digital libraries; document image processing; image retrieval; scientific information systems; LiveMemory platform; PDF document; automatic bibliographic reference extraction; digital document; digital libraries; document engineering; scientific articles; scientific documents; Accuracy; Classification algorithms; Data mining; Portable document format; Proposals; Support vector machine classification; Training; bibliographic references; document processing; information extraction; learning; regular expression;
Conference_Titel :
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
Conference_Location :
Gold Cost, QLD
Print_ISBN :
978-1-4673-0868-7
DOI :
10.1109/DAS.2012.12