• DocumentCode
    2145993
  • Title

    Metadata Extraction System for Chinese Books

  • Author

    Gao, Liangcai ; Zhong, Yuan ; Tang, Yingmin ; Tang, Zhi ; Lin, Xiaofan ; Hu, Xuan

  • Author_Institution
    Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
  • fYear
    2011
  • fDate
    18-21 Sept. 2011
  • Firstpage
    749
  • Lastpage
    753
  • Abstract
    Extracting metadata from academic papers has attracted much attention from researchers in past years. But how to extract metadata automatically from books is still seldom discussed. In this paper, we address this task on Chinese books and present a system to extract metadata from the title page of a book. This system consists of three components: metadata segmentation, metadata labeling, and post-processing. Different strategies are adopted in the system to identify different metadata types, and a variety of information sources, including geometric layout, linguistic, semantic content and header-footer, are used to accommodate the wide range of metadata layouts. Experimental results on real-world data have demonstrated the effectiveness of the proposed system.
  • Keywords
    electronic publishing; information retrieval; learning (artificial intelligence); meta data; Chinese books; academic papers; geometric layout; header-footer; information sources; linguistic; metadata extraction system; metadata labeling; metadata segmentation; post-processing; real-world data; semantic content; Accuracy; Books; Computer architecture; Data mining; Labeling; Microprocessors; Support vector machines; electronic book; metadata extraction; page segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2011 International Conference on
  • Conference_Location
    Beijing
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4577-1350-7
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2011.156
  • Filename
    6065411