• DocumentCode
    1634323
  • Title

    Analysis of Book Documents´ Table of Content Based on Clustering

  • Author

    Gao, Liangcai ; Tang, Zhi ; Lin, Xiaofan ; Tao, Xin ; Chu, Yimin

  • Author_Institution
    Inst. of Comput. Sci. & Technol., Peking Univ., Beijing, China
  • fYear
    2009
  • Firstpage
    911
  • Lastpage
    915
  • Abstract
    Table of contents (TOC) recognition has attracted a great deal of attention in recent years. After reviewing the merits and drawbacks of the existing TOC recognition methods, we have observed that book documents are multi-page documents with intrinsic local format consistency. Based on this finding we introduce an automatic TOC analysis method through clustering. This method first detects the decorative elements in TOC pages. Then it learns a layout model used in the TOC pages through clustering. Finally, it generates TOC entries and extracts their hierarchical structure under the guidance of the model. More specifically, broken lines are taken into account in the method. Experimental results show that this method achieves high accuracy and efficiency. In addition, this method has been successfully applied in a commercial e-book production software package.
  • Keywords
    document handling; electronic publishing; pattern clustering; book documents; clustering technique; commercial e-book production software package; decorative element detection; intrinsic local format consistency; layout model; multipage documents; table of contents recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.143
  • Filename
    5277548