• DocumentCode
    3432151
  • Title

    GATE framework based metadata extraction from scientific papers

  • Author

    Huynh, Tin ; Hoang, Kiem

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Inf. Technol., Ho Chi Minh City, Vietnam
  • fYear
    2010
  • fDate
    2-4 Nov. 2010
  • Firstpage
    188
  • Lastpage
    191
  • Abstract
    In this paper we propose a method to extract automatically metadata (title, authors, affiliation, email, references, etc) from science papers by combining the layout information of papers with rules which are defined by using JAPE Grammar rules of GATE. After metadata extracted automatically from digital documents, user can interact and correct them before they are exported to XML files. Developing a tool to extract metadata from digital documents is a very necessary and useful task for building collections, organizing and searching documents in digital libraries. The extraction method is tested on computer science paper collections selected from international journals, proceedings downloaded from digital libraries such as ACM, IEEE, Springer and CiteSeer.
  • Keywords
    data mining; digital libraries; document handling; GATE framework; JAPE grammar rules; digital document; digital libraries; metadata extraction; scientific paper; Data mining; Electronic mail; Layout; Libraries; Logic gates; Machine learning; Ontologies; Information extraction; automation; metadata;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Education and Management Technology (ICEMT), 2010 International Conference on
  • Conference_Location
    Cairo
  • Print_ISBN
    978-1-4244-8616-8
  • Electronic_ISBN
    978-1-4244-8618-2
  • Type

    conf

  • DOI
    10.1109/ICEMT.2010.5657675
  • Filename
    5657675