DocumentCode
3432151
Title
GATE framework based metadata extraction from scientific papers
Author
Huynh, Tin ; Hoang, Kiem
Author_Institution
Dept. of Comput. Sci., Univ. of Inf. Technol., Ho Chi Minh City, Vietnam
fYear
2010
fDate
2-4 Nov. 2010
Firstpage
188
Lastpage
191
Abstract
In this paper we propose a method to extract automatically metadata (title, authors, affiliation, email, references, etc) from science papers by combining the layout information of papers with rules which are defined by using JAPE Grammar rules of GATE. After metadata extracted automatically from digital documents, user can interact and correct them before they are exported to XML files. Developing a tool to extract metadata from digital documents is a very necessary and useful task for building collections, organizing and searching documents in digital libraries. The extraction method is tested on computer science paper collections selected from international journals, proceedings downloaded from digital libraries such as ACM, IEEE, Springer and CiteSeer.
Keywords
data mining; digital libraries; document handling; GATE framework; JAPE grammar rules; digital document; digital libraries; metadata extraction; scientific paper; Data mining; Electronic mail; Layout; Libraries; Logic gates; Machine learning; Ontologies; Information extraction; automation; metadata;
fLanguage
English
Publisher
ieee
Conference_Titel
Education and Management Technology (ICEMT), 2010 International Conference on
Conference_Location
Cairo
Print_ISBN
978-1-4244-8616-8
Electronic_ISBN
978-1-4244-8618-2
Type
conf
DOI
10.1109/ICEMT.2010.5657675
Filename
5657675
Link To Document