• DocumentCode
    2862102
  • Title

    A Rule-Based Framework of Metadata Extraction from Scientific Papers

  • Author

    Guo, Zhixin ; Jin, Hai

  • Author_Institution
    Cluster & Grid Comput. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • fYear
    2011
  • fDate
    14-17 Oct. 2011
  • Firstpage
    400
  • Lastpage
    404
  • Abstract
    Most scientific documents on the web are unstructured or semi-structured, and the automatic document metadata extraction process becomes an important task. This paper describes a framework for automatic metadata extraction from scientific papers. Based on a spatial and visual knowledge principle, our system can extract title, authors and abstract from scientific papers. We utilize format information such as font size and position to guide the metadata extraction process. The experiment results show that our system achieves a high accuracy in header metadata extraction which can effectively assist the automatic index creation for digital libraries.
  • Keywords
    Internet; digital libraries; document handling; indexing; information retrieval; knowledge based systems; meta data; natural sciences computing; Web; automatic document metadata extraction; automatic index creation; digital libraries; header metadata extraction; rule-based framework; scientific documents; scientific papers; spatial knowledge principle; visual knowledge principle; Accuracy; Data mining; Layout; Libraries; Portable document format; Semantics; XML; document metadata; information extraction; rule-based approach;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing and Applications to Business, Engineering and Science (DCABES), 2011 Tenth International Symposium on
  • Conference_Location
    Wuxi
  • Print_ISBN
    978-1-4577-0327-0
  • Type

    conf

  • DOI
    10.1109/DCABES.2011.14
  • Filename
    6118700