DocumentCode
1544042
Title
A fast algorithm for bottom-up document layout analysis
Author
Simon, Aniko ; Pret, Jean Christophe ; Johnson, A. Peter
Author_Institution
Sch. of Chem., Leeds Univ., UK
Volume
19
Issue
3
fYear
1997
fDate
3/1/1997 12:00:00 AM
Firstpage
273
Lastpage
277
Abstract
This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLIDE (Chemical Literature Data Extraction) system, but the method described here is suitable for a broader range of documents. It is based on Kruskal´s algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression
Keywords
computational complexity; document image processing; heuristic programming; image segmentation; CLIDE; Chemical Literature Data Extraction; block alignments; bottom-up document layout analysis; computational complexity; fast algorithm; heuristics; path-compression; physical page structure; text spacing; Algorithm design and analysis; Chemical analysis; Chemical processes; Data mining; Graphics; Image segmentation; Independent component analysis; Layout; Optical character recognition software; Text analysis;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/34.584106
Filename
584106
Link To Document