DocumentCode :
1544042
Title :
A fast algorithm for bottom-up document layout analysis
Author :
Simon, Aniko ; Pret, Jean Christophe ; Johnson, A. Peter
Author_Institution :
Sch. of Chem., Leeds Univ., UK
Volume :
19
Issue :
3
fYear :
1997
fDate :
3/1/1997 12:00:00 AM
Firstpage :
273
Lastpage :
277
Abstract :
This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLIDE (Chemical Literature Data Extraction) system, but the method described here is suitable for a broader range of documents. It is based on Kruskal´s algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression
Keywords :
computational complexity; document image processing; heuristic programming; image segmentation; CLIDE; Chemical Literature Data Extraction; block alignments; bottom-up document layout analysis; computational complexity; fast algorithm; heuristics; path-compression; physical page structure; text spacing; Algorithm design and analysis; Chemical analysis; Chemical processes; Data mining; Graphics; Image segmentation; Independent component analysis; Layout; Optical character recognition software; Text analysis;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/34.584106
Filename :
584106
Link To Document :
بازگشت