• DocumentCode
    1740885
  • Title

    Analysis, understanding and representation of Chinese newspaper with complex layout

  • Author

    Chen, Ming ; Ding, Xiaoqing ; Liang, Jian

  • Author_Institution
    Dept. of Electron. Eng., Tsinghua Univ., Beijing, China
  • Volume
    2
  • fYear
    2000
  • fDate
    10-13 Sept. 2000
  • Firstpage
    590
  • Abstract
    Layout analysis, understanding and representation are important problems when transforming paper document to its electronic version. For a Chinese newspaper with a complex layout, a bottom-up algorithm of layout analysis based on nearest neighbor connect-strength and line confidence is proposed. We also propose a rule-based growing algorithm used for layout understanding. The implementation of layout representation is discussed at the same time. Using these algorithms with a Chinese OCR engine, we developed a complete system that can be used for automatic electronic publishing. The algorithms were proved to be efficient and practical by experimental results and by a practically running system.
  • Keywords
    document image processing; electronic publishing; knowledge based systems; optical character recognition; Chinese OCR engine; Chinese newspaper; automatic electronic publishing; bottom-up algorithm; complex layout; electronic document; layout analysis; layout representation; layout understanding; line confidence; nearest neighbor connect-strength; paper document; rule-based growing algorithm; Algorithm design and analysis; CD-ROMs; Character recognition; Electronic publishing; Engines; Image analysis; Image converters; Information analysis; Nearest neighbor searches; Optical character recognition software;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Image Processing, 2000. Proceedings. 2000 International Conference on
  • Conference_Location
    Vancouver, BC, Canada
  • ISSN
    1522-4880
  • Print_ISBN
    0-7803-6297-7
  • Type

    conf

  • DOI
    10.1109/ICIP.2000.899500
  • Filename
    899500