DocumentCode
2862102
Title
A Rule-Based Framework of Metadata Extraction from Scientific Papers
Author
Guo, Zhixin ; Jin, Hai
Author_Institution
Cluster & Grid Comput. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
fYear
2011
fDate
14-17 Oct. 2011
Firstpage
400
Lastpage
404
Abstract
Most scientific documents on the web are unstructured or semi-structured, and the automatic document metadata extraction process becomes an important task. This paper describes a framework for automatic metadata extraction from scientific papers. Based on a spatial and visual knowledge principle, our system can extract title, authors and abstract from scientific papers. We utilize format information such as font size and position to guide the metadata extraction process. The experiment results show that our system achieves a high accuracy in header metadata extraction which can effectively assist the automatic index creation for digital libraries.
Keywords
Internet; digital libraries; document handling; indexing; information retrieval; knowledge based systems; meta data; natural sciences computing; Web; automatic document metadata extraction; automatic index creation; digital libraries; header metadata extraction; rule-based framework; scientific documents; scientific papers; spatial knowledge principle; visual knowledge principle; Accuracy; Data mining; Layout; Libraries; Portable document format; Semantics; XML; document metadata; information extraction; rule-based approach;
fLanguage
English
Publisher
ieee
Conference_Titel
Distributed Computing and Applications to Business, Engineering and Science (DCABES), 2011 Tenth International Symposium on
Conference_Location
Wuxi
Print_ISBN
978-1-4577-0327-0
Type
conf
DOI
10.1109/DCABES.2011.14
Filename
6118700
Link To Document