Title :
Comprehensive Global Typography Extraction System for Electronic Book Documents
Author :
Gao, Liangcai ; Tang, Zhi ; Lin, Xiaofan ; Qiu, Ruiheng
Author_Institution :
Inst. of Comput. Sci. & Technol., Peking Univ., Beijing
Abstract :
Book documents usually have consistent typographies throughout the whole book, including headers, footers, columns, text line directions, and fonts used in the each level of headings. Such document-level typography information is of great value for downstream document processing applications. This paper presents a document analysis system that can extract a comprehensive set of typographies used in book documents. The system consists of several components: recognition of fonts used in the body text and chapter headings; detection of page body area, headers and footers; detection of columns, text line direction and line spacing of body text. Page-association is employed in the system. The preliminary experimental results demonstrate the effectiveness of the system.
Keywords :
character sets; document image processing; electronic publishing; information retrieval; information retrieval systems; text analysis; document analysis system; document processing application; electronic book document; font recognition; text line spacing; typography extraction system; Application software; Books; Computer science; Data mining; Electronic publishing; Image analysis; Information analysis; Sections; Text analysis; Text recognition;
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
DOI :
10.1109/DAS.2008.30