DocumentCode
1645554
Title
Text block recognition from TIFF images
Author
Lovegrove, William ; Elliman, David
Author_Institution
Nottingham Univ., UK
fYear
1995
fDate
11/2/1995 12:00:00 AM
Firstpage
42461
Lastpage
42466
Abstract
The reproduction of a scanned document should include not only the optical character recognition of text, but also the structure of that text on the page and the appearance of that text itself (i.e. font recognition). This is paper presents an algorithm which structurally recognises the text of a page image. The method is based upon the “Docstrum plot” algorithm by L.O´Gorman (1993). Modifications have been made to O´Gorman´s algorithm which render very good results at identifying paragraphs and lines in particular. The algorithm implementation can, to a limited degree, describe the logical relationship of the text elements of the original page. The limitations of the algorithm are due to the lack of information available without OCR and font technology incorporated into the algorithm implementation. The algorithm implementation has a graphical interface which portrays the state of the algorithm during the process of decomposition
Keywords
document image processing; optical character recognition; pattern recognition; Docstrum plot algorithm; OCR; TIFF image; algorithm; font recognition; graphical interface; optical character recognition; page image; page layout; pattern recognition; scanned document; text block recognition; text recognition; text structure;
fLanguage
English
Publisher
iet
Conference_Titel
Document Image Processing and Multimedia Environments, IEE Colloquium on
Conference_Location
London
Type
conf
DOI
10.1049/ic:19951185
Filename
498878
Link To Document